MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML and is open-sourced for commercial use ( Apache-2.0).
MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ( ALiBi). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA’s FasterTransformer.
This model uses the MosaicML LLM codebase, which can be found in the llm-foundry repository. It was trained by MosaicML’s NLP team on the MosaicML platform for LLM pretraining, finetuning, and inference.
The following models are finetuned on MPT-7B:
MPT-7B-StoryWriter-65k+: a model designed to read and write fictional stories with super long context lengths. Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. At inference time, thanks to ALiBi, MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens. We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our blogpost.
MPT-7B-Instruct: a model for short-form instruction following. Built by finetuning MPT-7B on a dataset we also release, derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets.
To get started, clone this repo and install the requirements:
``` git clone https://github.com/mosaicml/llm-foundry.git cd llm-foundry pip install -e ".[gpu]" # or pip install -e . if no NVIDIA GPU
Here is an end-to-end workflow for preparing a subset of the C4 dataset, training an MPT-125M model for 10 batches, converting the model to HuggingFace format, evaluating the model on the Winograd challenge, and generating responses to prompts.
If you have a write-enabled HuggingFace auth token, you can optionally upload your model to the Hub! Just export your token like this:
``` export HUGGING_FACE_HUB_TOKEN=your-auth-token
and uncomment the line containing
(Remember this is a quickstart just to demonstrate the tools – To get good quality, the LLM must be trained for longer than 10 batches 😄)
``` cd scripts
python data_prep/convert_dataset_hf.py \ --dataset c4 --data_subset en \ --out_root my-copy-c4 --splits train_small val_small \ --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text ''
composer train/train.py \ train/yamls/mpt/125m.yaml \ data_local=my-copy-c4 \ train_loader.dataset.split=train_small \ eval_loader.dataset.split=val_small \ max_duration=10ba \ eval_interval=0 \ save_folder=mpt-125m
python inference/convert_composer_to_hf.py \ --composer_path mpt-125m/ep0-ba10-rank0.pt \ --hf_output_path mpt-125m-hf \ --output_precision bf16 \ # --hf_repo_for_upload user-org/repo-name
python eval/eval.py \ eval/yamls/hf_eval.yaml \ icl_tasks=eval/yamls/winograd.yaml \ model_name_or_path=mpt-125m-hf
python inference/hf_generate.py \ --name_or_path mpt-125m-hf \ --max_new_tokens 256 \ --prompts \ "The answer to life, the universe, and happiness is" \ "Here's a quick recipe for baking chocolate chip cookies: Start by"
Visit Official Website