This repository records EleutherAI’s library for training large-scale language models on GPUs. Our current framework is based on NVIDIA’s Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.
For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX.
If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers
library instead which supports GPT-NeoX models.
Prior to 3/9/2023, GPT-NeoX relied on DeeperSpeed, which was based on an old version of DeepSpeed (0.3.15). In order to migrate to the latest upstream DeepSpeed version while allowing users to access the old versions of GPT-NeoX and DeeperSpeed, we have introduced two versioned releases for both libraries:
First make sure you are in an environment with Python 3.8 with an appropriate version of PyTorch 1.8 or later installed. Note: Some of the libraries that GPT-NeoX depends on have not been updated to be compatible with Python 3.10+. Python 3.9 appears to work, but this codebase has been developed and tested for Python 3.8.
To install the remaining basic dependencies, run:
``` pip install -r requirements/requirements.txt python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels
```
from the repository root.
Warning: Our codebase relies on DeeperSpeed, our fork of the DeepSpeed library with some added changes. We strongly recommend using Anaconda, a virtual machine, or some other form of environment isolation before continuing. Failure to do so may cause other repositories that rely on DeepSpeed to break.
=======
To use Flash-Attention, install the additional dependencies in ./requirements/requirements-flashattention.txt
and set the attention type in your configuration accordingly (see configs). This can provide significant speed-ups over regular attention on certain GPU architectures, including Ampere GPUs (such as A100s); see the repository for more details.
We also provide a Dockerfile if you prefer to run NeoX in a container. To use this option, first build an image named gpt-neox
from the repository root directory with docker build -t gpt-neox -f Dockerfile .
. We also host pre-built images on Docker Hub at leogao2/gpt-neox
.
You can then run a container based on this image. For instance, the below snippet mounts the cloned repository ( gpt-neox
) directory to /gpt-neox
in the container and uses nvidia-docker to make four GPUs (numbers 0-3) accessible to the container. As noted by the NCCL documentation, both --shm-size=1g
and --ulimit memlock=-1
are important to prevent Docker from allocating too little shared memory.
``` nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD,dst=/gpt-neox gpt-neox
```
All functionality (inference included), should be launched using deepy.py
, a wrapper around the deepspeed
launcher.
We currently offer three main functions:
train.py
is used for training and finetuning models.evaluate.py
is used to evaluate a trained model using the language model evaluation harness.generate.py
is used to sample text from a trained model.which can be launched with:
``` ./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]
```
E.G To generate text unconditionally with the GPT-NeoX-20B model, you can use the following:
``` ./deepy.py generate.py ./configs/20B.yml
```
Or optionally pass in a text file (e.g prompt.txt
) to use as the prompt, which should be a plain .txt
file with each prompt separated by newline characters, also passing in the path to an output file.
``` ./deepy.py generate.py ./configs/20B.yml -i prompt.txt -o sample_outputs.txt
```
To reproduce our evaluation numbers on, for example, TriviaQA and PIQA use:
``` ./deepy.py evaluate.py ./configs/20B.yml --eval_tasks triviaqa piqa
```
You can add an arbitrary list of evaluation tasks here, for details of all tasks available, see lm-evaluation-harness.
For more details on each entry point, see the Training and Finetuning, Inference and Evaluation
Visit Official Website