7,000 languages are spoken on Earth, but only 7% are online

According to a recent study, only twelve languages make up a staggering 98% of the pages online, leaving those in the minority facing extinction.



Almost 7,000 languages and dialects exist on Earth, yet only 7% are accounted for in published online material, according to Whose Knowledge?.

In fact, 12 languages themselves comprise 98% of the global webpages, with English making up 72% of them alone. As a result, indigenous cultures are largely voiceless online, their peoples being forced to either learn another language or remain digitally silent.

Miguel Ángel Oxlaj Kumez, one of the organisers of the first Latin American Festival of Indigenous Languages on the Internet, is part of the Kagchikel Mayan community of Guatemala, along with half a million more speakers.

“When I get on the internet, I find more than 90% of the content in English and hence a significant percentage in Spanish and other languages,” he told BBC. “So, what I have to do is to move to another language, and that favours the displacement of my own language.”

“It discredits my own language, because – as it is not on the internet – then it is not valid, then it does not work, therefore why am I going to continue learning it? Why am I going to teach it to my children if, when I turn on the internet or television, I cannot find it there?”

Oxlaj Kumez is working with other activists to create a Kaghcikel Mayan version of Wikipedia, as well as a translated version of Mozilla Firefox. He is not alone in his efforts; in 2003, UNESCO adopted a recommendation to promote the use of multilingualism online, pushing for universality on the internet ever since (with a special focus on indigenous languages).

The largest problem, however, is access. Over 40% of the world’s population does not have access to online infrastructure. And while 76% of the cyber population resides in Africa, Asia, the Middle East, Latin America and the Caribbean, most online content is produced elsewhere.

There are levels to this linguistic divide, too. Hardware (e.g. keyboards) to software (e.g. programming languages), website domains to social media, the lack of diverse alphabets across these domains prevents almost all indigenous languages being part of the online stratosphere.

Victoria Aguilar, a native speaker of the Mixeteco language and a linguistics student at Mexico’s National University, claims that the primary problem is that societies are now emulating the same structural inequalities found offline on the internet.

“We need a lot of work in localisation, in adapting the technology to our needs,” she says. “The internet is a wonderful channel of communication, but it also reflects the inequalities in real life. The way in which some forms of writing are being neglected is affecting the fact that we cannot write freely on the internet. We need technologies that allow us to accelerate this process.”

“If we do not hurry at this time with the technologies, it can play against us, because it can pull us towards a Spanish language homogenisation in the case of Mexico,” she says. “It is a key moment for languages, because there is an internet boom and more and more people have data.”


Indigenous cultures are largely voiceless online, their peoples being forced to either learn another language or remain digitally silent.


There have been some improvements. The Internationalised Domain Names (IDN) have grappled with the challenge from a domain name perspective. Since 2003, the IDN project has worked to enable 152 languages, 75 in Chinese, Japanese and Korean (CJK) scripts and 33 Arabic scripts. 

Expanding internet access along with the domain name movement of adopting new languages has already influenced online populations regarding the content they produce. The Council of European National Top-Level Domain Registries and the Oxford Information Labs conducted a study that found that country and regional top-level domains enhance the presence of local languages online, as well as showing lower levels of English language than is found in the domain name sector globally.

The Wikimedia community themselves have acknowledged the struggle to make Wikipedia more diverse and multilingual. The free online encyclopedia has made great strides, having published articles in 307 languages as of December 2019, making it the most diverse platform online.

“There is a responsibility of technology platforms to give access to technology in these languages and to reduce the internet access gap, and there is responsibility of the state as well,” says José Flores, Wikimedia vice-president in Mexico.

Unfortunately, the gap cannot be closed by tech companies and the state alone. “It seems to me that academia, the community itself and even journalism and media are responsible, because there is a need for more sources to build articles on Wikipedia,” added Flores. Wikipedia articles need to cite second-hand published sources. When it comes to communities that are not well documented, this becomes an issue.

“It’s not just that we need to connect but also how we connect,” adds Flores. “It goes beyond the infrastructure, because it is also about the social uses of this infrastructure.”





Share via