A 6 billion parameter, autoregressive text generation model trained on The Pile.
This project would not have been possible without compute generously provided by the TPU Research Cloud with assistance from EleutherAI.
Thanks to the Cloud TPU team at Google for providing early access to the Cloud TPU VM alpha (now publicly available!)
Thanks to everyone who have helped out one way or another (listed alphabetically):
Aran Komatsuzaki for advice with experiment design and writing the blog posts. James Bradbury for valuable assistance with debugging JAX issues. Janko Prester for creating the web demo frontend. Laurence Golding for adding some features to the web demo. Leo Gao for running zero shot evaluations for the baseline models for the table.
Visit Official Website
https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b