Event Recap: Languages, AI, and the Infrastructure of an Inclusive Internet

"To be excluded from the language of the digital world is to be silenced in the modern era."

That was CODI Founder Ram Mohan's challenge to the room at the coalition's second annual Universal Acceptance (UA) Day event on May 28. The gathering, titled "Universal Participation: Languages, AI, and the Infrastructure of an Inclusive Internet," brought together technologists, linguists, policymakers, and community advocates around a single, urgent question: as AI reshapes how people access and participate in the internet, who gets left out?

‍

The Stakes Are Concrete

Amrit Sufi, a member of the Wikimedia-supported Indic Oral Culture Project, made the stakes tangible. Her work focuses on Angika, an endangered language spoken in parts of India, and the community-driven effort to document and preserve it before it disappears.

"Language is one of the first parts of identity that we lose when there is an outside force, like a more dominant language," she said.

Digital tools can help. But only if they're built to. Right now, most aren't.

‍

Where AI Falls Short

Sabina Jasinska, Co-Chair of CODI's Minimum Viable Dataset (MVD) Working Group, noted that AI systems routinely reinforce stereotypes when cultural context is missing from training data. UNESCO's David Castillo went further, warning that overreliance on AI creates misinformation risks and accelerates language extinction by concentrating knowledge in a handful of dominant languages.

The problem isn't just technical. CODI Board Director Dan York pointed to structural barriers including insufficient infrastructure and limited compute capacity that prevent smaller language communities from participating in AI development at all. As Castillo put it: "Communities are frequently data providers but not decision-makers in AI development."

That asymmetry has consequences. When communities can't shape the systems that represent them, those systems get it wrong and the communities pay the price.

‍

Two Frameworks for Getting It Right

CODI introduced two initiatives aimed at changing that.

The first, the Language Infrastructure Framework, is a reference model for the technical systems languages need to function across the internet, from encoding and script legibility to trust, safety, and accountability. Co-founders Christian Dawson and Dan York presented it as a foundation for building a more inclusive internet and invited broader community feedback as the framework continues to evolve.

The second initiative, led by Jasinska and Co-Chair Clarian Makungu, tackles AI representation directly. Their MVD Working Group is developing a framework to help language communities define how their languages should appear in AI systems. Makungu described it using a simple image: adequate linguistic data, culturally grounded representation, and ethical governance are the three legs of a stool. Remove any one of them and the whole thing collapses.

"If participation is weak," she said, "AI becomes extractive, data becomes misused, and trust breaks down."

‍

The Path Forward

Universal Acceptance, the principle that all valid internet identifiers should work everywhere, remains essential infrastructure. But as Ram made clear in his opening remarks, it's not enough on its own. True digital inclusion requires coordination across infrastructure, data, and communities. It requires building systems that don't just tolerate other languages but are designed around them.

"People cannot fully participate in our shared digital future," Ram said, "if they cannot do so in the language of their heart, their home, and their history."

‍

More Stories