The Barcelona Supercomputing Center, mobile operator Beeline Kazakhstan, telecom firm Veon (VON.AS), opens a new tab, and the GSMA lobby organization said on Wednesday that they will collaborate to close the “AI language gap” for underrepresented languages.
Huge language models that drive “bots” like chatGPT frequently learn how to produce replies that resemble those of humans by analyzing vast amounts of internet content, including digital books, webpages, articles, and blogs. However, certain languages have limited resources and data.
“Out of nearly 7000 languages spoken around the globe, only seven are considered high-resource languages in the digital world: English, Spanish, French, Mandarin, Arabic, German and Japanese,” the organizations stated in a joint statement.
In underrepresented languages, such as those spoken in Pakistan, Ukraine, Bangladesh, Kazakhstan, Uzbekistan, and Kyrgyzstan, where Veon operates, they will work together to build tools and language model documentation.
According to the statement, there was also Catalan, which is spoken by almost 10 million people. “The lack of resources in other languages results in an AI language gap which leads to sub-optimal user experience in AI applications, deepens the bias in AI models and risks deepening the digital divide in AI technologies ,” they stated.