NEWS
Announcing the global experts leading CODI’s inaugural research initiative on multilingual AI
The Coalition on Digital Impact (CODI) is excited to announce the experts from around the world who have joined our new Working Group on Defining the Minimum Viable Dataset for Cultural and Linguistic Empowerment. Their work will help build a stronger, more inclusive foundation for large language models (LLMs) that reflect the diversity of human experience…and now they have begun!
Our call for volunteers received an incredible response from experts and advocates across sectors and continents. From community leaders and language activists to data scientists, lawyers, and technologists, many people were eager to be part of this work.
Selected experts represent a wide range of experience in cultural and linguistic representation, AI and data governance, ethics, digital rights, and policy. You can learn more about these dedicated professionals and their backgrounds here.
The Internet’s infrastructure was built on the promise of openness, but too often it hits a language barrier. The Internet is currently dominated by only a few major languages, yet thousands of other languages and dialects are spoken daily around the world. Many of these are Indigenous, minority, and oral-tradition languages rarely seen online. As AI advances, it can help support more languages on the Internet and preserve cultures and identities that might otherwise be lost. For example, a student in rural Mexico could use AI to write and translate stories in Mixtec, sharing their heritage and helping keep the language alive for future generations.
CODI’s new Working Group is taking an important step toward realizing this goal. In the coming months, the Working Group will determine what makes a dataset culturally relevant and ethically sourced for use, largely based on whether data can capture oral traditions, storytelling structures, idioms and metaphors that reflect local worldviews. From there, the group will recommend ethical and responsible ways to collect and share data and develop guidance for making data technically compatible across systems.
AI’s true power can only be unleashed if it’s culturally relevant to the people who use it, and even the most tightly automated services are only valuable when they can be understood. CODI is proud to support this effort and looks forward to sharing updates as the project progresses. In the meantime, if you are interested in receiving regular updates about the project, you can sign up to be an observer. Please email: WG-support@codi.global to subscribe to the WG Observers email list.
Otherwise, reach out to us if you’d like to ask a question, share an experience, or join our effort to create an Internet that speaks every language!