Cerebras Systems Sets Record for Largest AI Models Ever Trained on a Single Device

Single CS-2 system trains multi-billion parameter NLP fashions together with GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B; offers easy setup, sooner coaching and permits switching between fashions with a few keystrokes
Cerebras Systems, the pioneer in excessive efficiency synthetic intelligence (AI) computing, introduced, for the primary time ever, the flexibility to coach fashions with as much as 20 billion parameters on a single CS-2 system – a feat not doable on some other single system. By enabling a single CS-2 to coach these fashions, Cerebras reduces the system engineering time essential to run giant pure language processing (NLP) fashions from months to minutes. It additionally eliminates one of the vital painful points of NLP — particularly the partitioning of the mannequin throughout lots of or 1000’s of small graphics processing items (GPU).
AI and ML News: Why SMBs Shouldn’t Be Afraid of Artificial Intelligence (AI)

“It will likely be fascinating to see the brand new functions and discoveries CS-2 prospects make as they practice GPT-3 and GPT-J class fashions on large datasets.”

“In NLP, larger fashions are proven to be extra correct. But historically, solely a very choose few firms had the assets and experience essential to do the painstaking work of breaking apart these giant fashions and spreading them throughout lots of or 1000’s of graphics processing items,” stated Andrew Feldman, CEO and Co-Founder of Cerebras Systems. “As a outcome, solely only a few firms might practice giant NLP fashions – it was too costly, time-consuming and inaccessible for the remainder of the business. Today we’re proud to democratize entry to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B, enabling the whole AI ecosystem to arrange giant fashions in minutes and practice them on a single CS-2.”
“GSK generates extraordinarily giant datasets by way of its genomic and genetic analysis, and these datasets require new tools to conduct machine studying,” stated Kim Branson, SVP of Artificial Intelligence and Machine Learning at GSK. “The Cerebras CS-2 is a crucial element that enables GSK to coach language fashions utilizing organic datasets at a scale and measurement beforehand unattainable. These foundational fashions kind the idea of a lot of our AI programs and play a important position within the discovery of transformational medicines.”
These world first capabilities are made doable by a mixture of the scale and computational assets accessible within the Cerebras Wafer Scale Engine-2 (WSE-2) and the Weight Streaming software program structure extensions accessible through launch of model R1.4 of the Cerebras Software Platform, CSoft.
When a mannequin matches on a single processor, AI coaching is simple. But when a mannequin has both extra parameters than can slot in reminiscence, or a layer requires extra compute than a single processor can deal with, complexity explodes. The mannequin should be damaged up and unfold throughout lots of or 1000’s of GPU. This course of is painful, typically taking months to finish. To make issues worse, the method is exclusive to every community compute cluster pair, so the work will not be transportable to totally different compute clusters, or throughout neural networks. It is solely bespoke.
The Cerebras WSE-2 is the most important processor ever constructed. It is 56 occasions bigger, has 2.55 trillion extra transistors, and has 100 occasions as many compute cores as the most important GPU. The measurement and computational assets on the WSE-2 permits each layer of even the most important neural networks to suit. The Cerebras Weight Streaming structure disaggregates reminiscence and compute permitting reminiscence (which is used to retailer parameters) to develop individually from compute. Thus a single CS-2 can assist fashions with lots of of billions even trillions of parameters.
Top Artificial Intelligence Insights: Determining the Potential of Your AI Algorithm Starts with Measurement
Graphics processing items on the opposite hand have a mounted quantity of reminiscence per GPU. If the mannequin requires extra parameters than slot in reminiscence, one wants to purchase extra graphics processors after which unfold work over a number of GPUs. The result’s an explosion of complexity. The Cerebras resolution is way easier and extra elegant: by disaggregating compute from reminiscence, the Weight Streaming structure permits assist for fashions with any variety of parameters to run on a single CS-2.
Powered by the computational capability of the WSE-2 and the architectural magnificence of the Weight Streaming structure, Cerebras is ready to assist, on a single system, the most important NLP networks. By supporting these networks on a single CS-2, Cerebras reduces arrange time to minutes and permits mannequin portability. One can change between GPT-J and GPT-Neo, for instance, with a few keystrokes, a activity that will take months of engineering time to realize on a cluster of lots of of GPUs.
“Cerebras’ potential to carry giant language fashions to the lots with cost-efficient, easy accessibility opens up an thrilling new period in AI. It offers organizations that may’t spend tens of tens of millions a simple and cheap on-ramp to main league NLP,” stated Dan Olds, Chief Research Officer, Intersect360 Research. “It will likely be fascinating to see the brand new functions and discoveries CS-2 prospects make as they practice GPT-3 and GPT-J class fashions on large datasets.”
Read More About AI News : AI Innovation Supports Rural and Remote Internet Connectivity
[To share your insights with us, please write to [email protected]]

https://aithority.com/machine-learning/cerebras-systems-sets-record-for-largest-ai-models-ever-trained-on-a-single-device/

Recommended For You