State-of-the-art language models are extremely challenging to train; they require huge compute budgets, complex distributed compute techniques and deep ML expertise. As a result, few organizations train large language models (LLMs) from scratch. And increasingly those that have the resources and expertise are not open sourcing the results, marking a significant change from even a few months back.
At Cerebras, we believe in fostering open access to the most advanced models. With this in mind, we are proud to announce the release to the open source community of Cerebras-GPT, a family of seven GPT models ranging from 111 million to 13 billion parameters. Trained using the Chinchilla formula, these models provide the highest accuracy for a given compute budget. Cerebras-GPT has faster training times, lower training costs, and consumes less energy than any publicly available model to date.
All models were trained on CS-2 systems that are part of the Andromeda AI supercomputer using our simple, data-parallel weight streaming architecture. By not having to worry about model partitioning, we were able to train these models in just a few weeks. Training these seven models has allowed us to derive a new scaling law. Scaling laws predict model accuracy based on the training compute budget and have been hugely influential in guiding AI research. To the best of our knowledge, Cerebras-GPT is the first scaling law that predicts model performance for a public dataset.
Today’s release is designed to be used by and reproducible by anyone. All models, weights, and checkpoints are available on Hugging Face and GitHub under the Apache 2.0 license. Additionally, we provide detailed information on our training methods and performance results in our paper, “ Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster“. The Cerebras CS-2 systems used for training are also available on-demand via Cerebras Model Studio.
Visit Official Website