Sat, Jan 24, 2026 | Shaban 5, 1447 | Fajr 05:44 | DXB
21.2°C
While previous Arabic models scored around 62 per cent on evaluation benchmarks Jais 2 delivers what researchers describe as 'state-of-the-art performance'

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and its partners on Tuesday released Jais 2, a 70-billion-parameter language model trained on the largest Arabic-first dataset ever assembled — 600 billion Arabic tokens, a scale no other institution has attempted.
The release strengthens the UAE’s position in Arabic AI. While previous Arabic models scored around 62 per cent on evaluation benchmarks, Jais 2 delivers what researchers describe as "state-of-the-art performance across both Arabic and bilingual tasks."
“Arabic has long been underserved in AI development due to the lack of high-quality data,” Professor Preslav Nakov, Department Chair of Natural Language Processing at MBZUAI, told Khaleej Times. “Today marks a defining advancement — a model built with scale, cultural depth, and linguistic fidelity at its core.”
What makes Jais 2 distinct is its development philosophy. Many global AI models treat Arabic as a secondary language, often translating English datasets or adding thin Arabic layers on top of English-centric systems. Jais 2 was built from scratch around Arabic structure, dialects, and real usage.
“Models developed elsewhere tend to treat Arabic as a peripheral addition,” Nakov said. “Most remain heavily biased toward English, leaving dialects and culturally nuanced contexts poorly modeled.”
The dataset spans Modern Standard Arabic, 17 regional dialects — including Gulf, Emirati, Moroccan, Egyptian, Iraqi — and Arabizi, the Latin-script Arabic widely used online. Jais 2 also incorporates 1.6 trillion English and code tokens, providing it with strong bilingual capabilities essential in a region where code-switching is shown to be part of everyday conversation.
“Code-switching is natural across the Arab world,” Nakov said. “Jais 2 treats it not as an anomaly, but as a normal linguistic pattern.”

Jais 2 was trained on more than 427,000 Arabic poems with detailed metadata and semantic annotations — giving it an understanding of classical and contemporary verse that global models lack.
“Arabic poetry is a clear domain where Jais 2 excels,” Nakov said. “Western models simply do not have enough exposure to interpret symbolism or cultural references that Jais handles naturally.”
This cultural grounding is reinforced through a custom-built Arabic vocabulary and safety frameworks designed around regional communication norms rather than Western assumptions.
Developed by Inception (a G42 company), Cerebras Systems, and MBZUAI’s Institute of Foundation Models, Jais 2 was trained and is served entirely on Cerebras hardware — a setup the partners say required a fraction of the computing power used by similar global models.
Beyond its technical achievement, the model represents a significant step forward. For the UAE and wider Arab world, sovereign Arabic AI ensures that the language, dialects, and cultural context are represented accurately in a rapidly digitising world.
“For the region, building sovereign Arabic models ensures representation, cultural fit, and reliability,” Nakov said. “It allows the Arab world to lead rather than follow.”
Jais 2 is being released as a fully open-weight 70B model — a decision Nakov describes as essential for accelerating local innovation.
“Releasing an open-weight model allows researchers, startups, and governments to build Arabic solutions on top of a state-of-the-art foundation,” he told Khaleej Times.
The release enables fine-tuning for applications across finance, healthcare, education, customer service, media, and government services. Jais 2 is available now through Inceptions HuggingFace page and at at jaischat.ai
