
We’re back with the second edition of CODI Signals, our Ask the Experts series bringing together sharp, candid perspectives from across the CODI community.
In this second edition, we turned to CODI members and participants of the Working Group on Defining the Minimum Viable Dataset for Cultural and Linguistic Empowerment who have been working to determine what makes a dataset culturally relevant in an AI-driven world. We asked a timely question:
“Who is responsible for making sure all languages work well with AI—and what happens if they don’t?”
Explore their insights below.
Clarian Makungu, PhD Researcher (AI & Cybersecurity), Strathmore University; Co-Chair of CODI’s Minimum Viable Dataset Working Group
“Ensuring that all languages work well with AI is a shared responsibility, but it is one that demands clear accountability rather than diffuse goodwill. Technology providers bear primary responsibility: model architectures, training data pipelines, evaluation benchmarks, and deployment choices all reflect explicit design decisions. When AI systems systematically underperform for large parts of the world’s linguistic diversity, this is not an accidental outcome but the result of prioritisation choices.
Governments and multilateral institutions also have a critical role. Language inclusion in AI is a public‑interest issue, intersecting with access to services, information integrity, and economic participation. Public policy, procurement standards, and targeted investment are essential to correct market incentives that otherwise favour scale over inclusion. Standards bodies and initiatives such as CODI play a coordinating role, translating normative commitments like Universal Acceptance and digital inclusion into technical and governance frameworks that can be operationalised.
If languages do not work well with AI, the consequences are structural. Speakers of under‑resourced languages risk becoming computationally invisible: misrepresented, excluded from automated services, or entirely absent from AI‑mediated knowledge systems. Over time, this reinforces existing inequalities and accelerates digital language loss. Addressing this challenge is therefore not only a technical necessity, but a question of equity, epistemic justice, and the long‑term legitimacy of AI systems themselves.”
Sabina Jasinska, Chief Marketing Officer, Hansen Technologies; Co-Chair of CODI’s Minimum Viable Dataset Working Group
“Responsibility for ensuring languages work well with AI cannot sit with language communities alone. The burden primarily lies with the organizations designing, funding, deploying, and commercializing AI systems, as well as the institutions shaping digital infrastructure and policy.
Today, AI is reinforcing existing patterns of digital exclusion. While more than 7,000 languages are spoken globally, only a relatively small number are meaningfully represented in modern AI systems. Major AI platforms and assistants still operate in only a few dozen languages.
This is one of the core challenges addressed in CODI’s Minimum Viable Dataset (MVD) report. The framework was developed to define the minimum conditions required for a language to participate meaningfully and ethically in digital and AI-enabled systems, not only from a technical perspective, but also from cultural and governance perspectives.
One of the key findings of the work is that language inclusion is not simply a question of data volume. A language may have millions of speakers and still remain digitally invisible if it lacks structured datasets, discoverability, governance mechanisms, or representation in AI systems. The framework therefore introduces an AI Readiness Pathway combining technical readiness with governance and ethical frameworks such as CARE and OCAP®, focused on community control, responsibility, consent, and collective benefit.
If these gaps are not addressed, entire communities risk exclusion from education, public services, economic participation, and future AI-driven knowledge systems.”
Aliya Bhatia, Senior Policy Analyst, Senior Policy Analyst, Center for Democracy & Technology's Free Expression Project
“To borrow from Mario Cuomo: people speak online in poetry—spanning languages, dialects, inside jokes, idioms, and slang—but AI systems operate in prose. As people worldwide rely on AI for everything from medical diagnoses to job applications, these systems must meet users where they are and work equally well in the languages they speak.
Advancing a truly multilingual AI paradigm will require an ecosystem-wide effort. Industry, researchers, language communities, and civil society must work together to ensure choices made in the development and deployment of AI systems represent the perspectives and needs of users around the world. Without that collective commitment, we risk deepening the "digital language divide" and building tools that users can tell were never made with them in mind.”