UAE: World's largest AI training supercomputer launched by Abu Dhabi firm

Condor Galaxy 1 will be used to address society’s most pressing challenges across healthcare, energy, climate action and more

Top Stories

Embracing the bright side: A guide on how to cultivate optimism

The power of solitude: Why you should go on a solo trip

How to get your kids to read: Essential tips for parents

Photos: Supplied

Published: Thu 20 Jul 2023, 5:42 PM

Last updated: Thu 20 Jul 2023, 11:10 PM

Abu Dhabi’s technology holding group G42 and California-based artificial intelligence chip startup Cerebras Systems have unveiled the world’s largest supercomputer for AI training.

Condor Galaxy, a network of nine interconnected supercomputers, offers a new approach to AI compute that promises to significantly reduce AI model training time.

The first AI supercomputer on this network, Condor Galaxy 1 (CG-1), has 4 exaFLOPs (a unit for measuring the speed of a computer, equal to one quintillion floating-point operating a second) and 54 million cores. Two more such supercomputers, CG-2 and CG-3, are expected to be deployed in the US early next year. With a planned capacity of 36 exaFLOPs in total, this unprecedented supercomputing network will revolutionise the advancement of AI globally.

Talal Alkaissi, CEO of G42 Cloud, a subsidiary of G42, noted that Condor Galaxy will be used to address society’s most pressing challenges across healthcare, energy, climate action and more.

“Collaborating with Cerebras to rapidly deliver the world’s fastest AI training supercomputer and laying the foundation for interconnecting a constellation of these supercomputers across the world has been enormously exciting. This partnership brings together Cerebras’ extraordinary compute capabilities, together with G42’s multi-industry AI expertise,” Alkaissi said.

Cerebras’ flagship product, the CS-2 system, powered by the world’s largest and fastest AI processor, makes training large models simple and easy, by avoiding the complexity of distributed computing. Located in Santa Clara, California, CG-1 links 64 CS-2 systems together into a single, easy-to-use AI supercomputer, with an AI training capacity of 4 exaFLOPs. Cerebras and G42 offer CG-1 as a cloud service, allowing customers to enjoy the performance of an AI supercomputer without having to manage or distribute models over physical systems.

Accelerating innovation

CG-1 is the first time Cerebras has partnered not only to build a dedicated AI supercomputer but also to manage and operate it. CG-1 is designed to enable G42 and its cloud customers to train large, ground-breaking models quickly and easily. The Cerebras-G42 strategic partnership has already advanced state-of-the-art AI models in Arabic bilingual chat, healthcare and climate studies.

“Delivering 4 exaFLOPs of AI compute at FP 16, CG-1 dramatically reduces AI training timelines while eliminating the pain of distributed compute,” said Andrew Feldman, CEO of Cerebras Systems.

“Many cloud companies have announced massive GPU clusters that cost billions of dollars to build, but that are extremely difficult to use. Distributing a single model over thousands of tiny GPUs takes months of time from dozens of people with rare expertise. CG-1 eliminates this challenge. Setting up a generative AI model takes minutes, not months and can be done by a single person. CG-1 is the first of three 4 exaFLOP AI supercomputers to be deployed across the US. Over the next year, together with G42, we plan to expand this deployment and stand up a staggering 36 exaFLOPs of efficient, purpose-built AI compute.”

G42’s work with diverse datasets across healthcare, energy and climate studies will enable users of the systems to train new cutting-edge foundational models. The partnership brings together a team of hardware engineers, data engineers, AI scientists, and industry specialists to deliver a full-service AI offering to solve customers’ problems. This combination will produce ground-breaking results and turbocharge hundreds of AI projects globally.

CG-1 explained

Optimised for Large Language Models and Generative AI, CG-1 delivers 4 exaFLOPs of 16-bit AI compute, with standard support for up to 600 billion parameter models and extendable configurations that support up to 100 trillion parameter models. With 54 million AI-optimised compute cores, 388 terabits per second of fabric bandwidth, and fed by 72,704 Advanced Micro Devices (AMD) EPYC processor cores, unlike any known GPU cluster, CG-1 delivers near-linear performance scaling from 1 to 64 CS-2 systems using simple data parallelism.

Forrest Norrod, executive vice president and general manager, Data Center Solutions Business Group, AMD, noted: “Driven by more than 70,000 AMD EPYC processor cores, Cerebras’ Condor Galaxy 1 will make accessible vast computational resources for researchers and enterprises as they push AI forward.”

CG-1 offers native support for training with long sequence lengths, up to 50,000 tokens out of the box, without any special software libraries. Programing CG-1 is done entirely without complex distributed programming languages, meaning even the largest models can be run without weeks or months spent distributing work over thousands of GPUs.

ALSO READ: