MBZUAI researcher receives $1 million Google funding to bridge AI’s Arabic language gap

Researchers involved in the project say progress in Arabic AI remains limited by fragmented collaboration across the region and weak integration between academia and industry

  • PUBLISHED: Mon 16 Feb 2026, 5:18 PM

A researcher at the Mohamed bin Zayed University of Artificial Intelligence has received an award from Google of one million dollars to support work addressing a longstanding limitation in AI systems: their uneven performance in Arabic compared with English.

The research is led by Professor Thamar Solorio, vice provost at MBZUAI, and will focus on developing resource-lean artificial intelligence systems.

These systems are designed to understand Arabic dialects, cultural context, and everyday usage without relying on the large volumes of manually labeled data that have shaped English language models.

Stay up to date with the latest news. Follow KT on WhatsApp Channels

Arabic is spoken by more than four hundred million people across more than twenty-six countries. Despite this scale, it is often treated as a low-resource language in artificial intelligence development. Researchers say the issue lies in how Arabic data is collected and structured, rather than its availability.

“Much of the Arabic data used in training comes from scraped news articles or religious texts,” said Nour Al Hassan, founder of Arabic.ai. “What is missing is everyday speech, dialect-heavy language, and content tied to specific fields and real-world use.”

These gaps affect how artificial intelligence tools perform in daily settings. Systems trained on limited or formal Arabic data often struggle with dialect variation, cultural reference, and context.

In healthcare, automated tools may offer guidance that users consider inappropriate or misaligned with social norms.

In education, tutoring systems can misinterpret cultural or religious material.

Arabic also presents structural challenges for language models. The language operates across two primary forms.

Modern Standard Arabic dominates formal writing and media, while spoken dialects vary widely across regions and dominate everyday communication.

Most large language models are trained heavily on formal text, which limits their ability to process dialect-based language.

Word meaning can shift by geography. The word “bas” can mean “only” in Egypt, “but” in the Levant, and “enough” in Gulf Arabic. These distinctions alter sentence meaning and remain difficult for current models to resolve consistently.

“This funding allows us to move from exploratory research into applied systems with direct relevance to people’s lives,” Professor Solorio told Khaleej Times.

“It supports a shift toward models grounded in the linguistic and cultural realities of the MENA region.”

The project seeks to address inefficiencies in current Arabic language model development. Instead of adapting systems originally designed for English, the research will establish frameworks built specifically for Arabic and other regional languages.

The research focuses on improving system understanding of Arabic as it is spoken and used in daily life, rather than relying on translation alone.

The resource-lean approach aims to reduce dependence on large annotated datasets and lower computing requirements.

Google said the grant aligns with its broader regional initiatives. “We are advancing our commitment to expanding access to innovative AI technologies in Arabic and its dialects,” said Yossi Matias, Vice President at Google and Head of Google Research.

The funding will also support postdoctoral and early-career researchers. Artificial intelligence research has historically concentrated in Western institutions with access to large computing budgets.

MBZUAI has previously developed Arabic-focused systems such as Jais 2 and K2 Think, both of which required significant infrastructure.

Researchers involved in the project say progress in Arabic AI remains limited by fragmented collaboration across the region and weak integration between academia and industry.

Solorio’s initiative aims to address these constraints by producing open frameworks that universities, startups, and public institutions across the region can adopt without major capital investment.

Potential applications include education, healthcare, cultural preservation, and digital communication.

As artificial intelligence adoption expands across the region, language remains a central factor in access and usability.