Local language data is essential for building effective AI tools
Local language data helps automated systems understand and respond to users in their own language and can help businesses reach their target audiences more effectively, said Ganesh Gopalan, founder and CEO of AI startup Gnani.ai, during a panel discussion at the Mint Digital Innovation Summit & Awards on Friday.
“If we don't have, firstly, content in the local language, if we can't talk to machines in the local language, then it is not possible for no system to work and, you know, reach the right audience,” said Gopalan.
The panel discussion also included Vivekanand Pani, Co-founder, Reverie Language Technologies, who agrees that to develop an AI tool for any language, the availability of data is crucial.
Pani rues that there is a lack of data in local languages, and this fact often goes unquestioned. As a result, building data becomes a critical step, he added.
India is one of the most linguistically diverse countries with 22 official languages and over hundred unofficial ones. Even though a majority of India's population doesn't speak English, a considerable portion of the internet content in India is still in English. This presents a dilemma for enterprises seeking to expand their reach to cover local language users who are now online.
To ensure that AI chatbots in future can communicate with local languages with the same efficiency and accuracy as in English, several local language database projects are underway. One such initiative called Project Vaani was announced last December by Google India in partnership with Bangalore-based AI and Robotics Technology Park (ARTPARK) and the Indian Institute of Science (IISc).
Gnani.ai’s Gopalan also noted during the panel discussion, that a lot of enterprises are now aware of the important role local language data plays. “There is acceptance now, a lot of the enterprises now know the importance of language for automated systems to talk to machines, and a bunch of other things,” he added.
Gopalan believes that AI has come a long way. He said that his firm has been developing speech-to-text engines in Indian languages, and has also developed automated speech recognition systems (ASR) systems.
That said, the panelists believe that integration of local languages into AI systems is still a complex task.
“English itself, reaching this point on the internet did not happen on the back of AI. It did happen on the back of a lot of other technologies. So, I think Indian languages getting supported through AI will still need some amount of basic challenges getting solved, and then we will see the real use of AI for India,” said Pani.
He explained, “Before we are able to make large-scale usage of AI in content creation itself, I think there is a degree of steps that we should still take on the fundamentals.”