India's first IT company launches "Indian version of ChatGPT" supporting 40 Indian dialects
Recently, Indian IT company Tech Mahindra announced the launch of "Project Indus", an open source basic language model for Indian languages. The project could become the company's most important yet. Currently, large language models like OpenAI's GPT model, despite their multilingual capabilities, are limited in their ability to understand and generate content in Indian languages by English datasets.
Picture source note: The picture is generated by AI, and the picture is authorized by the service provider Midjourney
Tech Mahindra CEO Gulnani said the model will be the largest Indian language model and may serve 25% of the global population. Tech Mahindra has not revealed the project cost or estimated release date, but the goal is to first build a 7 billion parameter language model.
The model will initially support 40 different dialects of Hindi language, with more languages and dialects being added gradually. They said that although some Indian language models such as Bhashini and AI4Bharat already exist, a basic model still needs to be developed. Their interface may feature voice and text messaging, but a ChatGPT-like chat interface has not been considered yet.
The overarching goal of Tech Mahindra is to first create a language model for text continuation and then provide dialogue capabilities. Once the model's performance and dialect generation effects are known, they will be released in open source.
The Hindi language model can prioritize cultural sensitivity, ensuring that generated content respects local customs and norms. It could also democratize AI to serve a wider range of non-English speakers in the country.
However, capturing data in different languages and dialects remains the biggest challenge for Tech Mahindra. To that end, the company is seeking contributions from speakers of different dialects to help build the dataset. They have opened a web portal for language donations from Indians.