Inside UAE’s most ambitious AI model: Teaching robots how to think

In PAN's simulation environment, a robotic hand can try hundreds of ways to grasp a cup before ever touching one in real life
- PUBLISHED: Sun 7 Dec 2025, 7:57 AM
Most AIS videos show what the world looks like. PAN, developed at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), tries to understand how the world works.
When MBZUAI unveiled PAN last month, coverage focused on its video generation capabilities. But that misses the real story: PAN wasn't built to make movies — it was built to teach robots how to think.
Stay up to date with the latest news. Follow KT on Whatsapp Channels
"Unlike typical video generators that turn text prompts into visually convincing clips, PAN is built to understand and simulate the world itself," Jon Carvill, Vice President of Marketing and Communications at MBZUAI, told Khaleej Times exclusively. "Video models imitate appearance; PAN captures the dynamics that make those appearances real."
While OpenAI's Sora and Google's Veo generate cinematic scenes, PAN is designed for reasoning — something robots and autonomous systems critically need.
The distinction matters because of a problem plaguing robotics development: training robots in the physical world is expensive, slow, and dangerous. Companies developing humanoid robots like Tesla's Optimus and Figure AI’s Helix face the same challenge: a single prototype can cost hundreds of thousands of dollars, and teaching it even basic tasks through real-world trial and error risks destroying that investment with every mistake.
Current robotics companies need hundreds of human operators performing thousands of repetitive demonstrations just to teach a couple of skills. The costs are staggering, and the timeline stretches across years.
PAN changes the equation by creating what researchers call a "world model"—an AI system that doesn't just generate visuals but understands cause and effect, physics, and how actions lead to consequences over time.
In PAN's simulation environment, a robotic hand can try hundreds of ways to grasp a cup before ever touching one in real life.
Refining behaviour
"Within PAN, a robotic agent can rehearse thousands of interactions—from an autonomous vehicle navigating traffic to a household robot folding laundry or loading a dishwasher—all while refining its behaviour before ever touching the real world," Carvill explained.
The technical architecture behind this capability separates PAN from competitors. While those systems generate complete videos in one pass, PAN maintains an internal memory of what exists in a scene and how objects move, updating its understanding step-by-step as it generates each new frame.
"We built PAN's architecture as a hybrid: diffusion handles visual fidelity, while the LLM maintains world semantics over longer horizons," noted Carvill.
The implications are significant. Advanced physics simulation can train robots 430,000 times faster than real-world learning, compressing what would take decades of physical practice into hours of computational time. This dramatically lowers costs and opens access to advanced robotics capabilities.
The system represents a shift toward what researchers call "embodied AI"—artificial intelligence that must understand physical consequences, not just patterns in data. Current large language models excel at text but lack grounding in how the physical world behaves.
Carvill underlined: "Intelligent agents must move beyond text-only reasoning. By modelling not just language but how the world behaves and responds, PAN forms the backbone for truly embodied systems."
MBZUAI's distributed development model—spanning teams in Abu Dhabi and Silicon Valley—accelerated PAN's creation through clearly defined roles and research pipelines that leveraged global talent pools across time zones.
Standard for intelligent agents
PAN fits into MBZUAI's broader Institute of Foundation Models strategy, which recently produced K2 Think, an AI reasoning system. The work advances the university's mission to build AI capabilities that benefit the global research community.
Looking toward 2030, Carvill outlined an ambitious vision: "Success would mean PAN-like world models becoming the standard substrate for intelligent agents—powering safe autonomous systems, realistic virtual environments, and AI that understands consequence, not just correlation."
For Abu Dhabi, PAN positions MBZUAI at the intersection of two converging technologies: advanced AI and physical robotics. The city's unique position—combining government support, international talent, and regional industry partnerships—creates what Carvill describes as "perspectives that are not found elsewhere" in global AI development.






