entry-slick
entry-slick
entry-slick
entry-slick
About MPT-7B

MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. This model was trained by MosaicML and is open-sourced for commercial use ( Apache-2.0).

MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.

These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases ( ALiBi). Thanks to these modifications, MPT models can be trained with high throughput efficiency and stable convergence. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA’s FasterTransformer.

This model uses the MosaicML LLM codebase, which can be found in the llm-foundry repository. It was trained by MosaicML’s NLP team on the MosaicML platform for LLM pretraining, finetuning, and inference.

How is this model different?

MPT-7B is

Models finetuned off MPT-7B:

The following models are finetuned on MPT-7B:

Installation

To get started, clone this repo and install the requirements:

``` git clone https://github.com/mosaicml/llm-foundry.git cd llm-foundry pip install -e ".[gpu]" # or pip install -e . if no NVIDIA GPU

```

Quickstart

Here is an end-to-end workflow for preparing a subset of the C4 dataset, training an MPT-125M model for 10 batches, converting the model to HuggingFace format, evaluating the model on the Winograd challenge, and generating responses to prompts.

If you have a write-enabled HuggingFace auth token, you can optionally upload your model to the Hub! Just export your token like this:

``` export HUGGING_FACE_HUB_TOKEN=your-auth-token

```

and uncomment the line containing --hf_repo_for_upload ....

(Remember this is a quickstart just to demonstrate the tools – To get good quality, the LLM must be trained for longer than 10 batches 😄)

``` cd scripts

Convert C4 dataset to StreamingDataset format

python data_prep/convert_dataset_hf.py \ --dataset c4 --data_subset en \ --out_root my-copy-c4 --splits train_small val_small \ --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text ''

Train an MPT-125m model for 10 batches

composer train/train.py \ train/yamls/mpt/125m.yaml \ data_local=my-copy-c4 \ train_loader.dataset.split=train_small \ eval_loader.dataset.split=val_small \ max_duration=10ba \ eval_interval=0 \ save_folder=mpt-125m

Convert the model to HuggingFace format

python inference/convert_composer_to_hf.py \ --composer_path mpt-125m/ep0-ba10-rank0.pt \ --hf_output_path mpt-125m-hf \ --output_precision bf16 \ # --hf_repo_for_upload user-org/repo-name

Evaluate the model on Winograd

python eval/eval.py \ eval/yamls/hf_eval.yaml \ icl_tasks=eval/yamls/winograd.yaml \ model_name_or_path=mpt-125m-hf

Generate responses to prompts

python inference/hf_generate.py \ --name_or_path mpt-125m-hf \ --max_new_tokens 256 \ --prompts \ "The answer to life, the universe, and happiness is" \ "Here's a quick recipe for baking chocolate chip cookies: Start by"

```

Visit Official Website

https://huggingface.co/mosaicml/mpt-7b

MosaicML
Responsible #GenAI requires good data governance. Learn how @databricks Data Intelligence Platform enables organizations to use, build, and scale models while maintaining compliance and security in the latest blog post from @NaveenGRao.
Your data, your...
link
Your data, your model: How custom LLMs can turbocharge operations while protecting valuable IP
Share
MosaicML
RT @databricks: Good morning #AWSreInvent 👋

🚨Booth 1022 opens up at 10am

In the meantime, stop by our Databricks lounge at the Venetian G…
Share
Community Posts
MosaicML
Responsible #GenAI requires good data governance. Learn how @databricks Data Intelligence Platform enables organizations to use, build, and scale models while maintaining compliance and security in the latest blog post from @NaveenGRao.
Your data, your...
link
Your data, your model: How custom LLMs can turbocharge operations while protecting valuable IP
Share
MosaicML
RT @databricks: Good morning #AWSreInvent 👋

🚨Booth 1022 opens up at 10am

In the meantime, stop by our Databricks lounge at the Venetian G…
Share
MosaicML
How should you instruction finetune LLMs? And how should you evaluate your finetuned models?

Our PhD research intern @aditi_jh will be attending NeurIPS 2023 to present her work at the Instruction Tuning and Instruction Following Workshop @itif_workshop ! See you there!
-------------
From @Aditi Jha:Excited to share work from my internship with the amazing people at @MosaicML! 🎉

How should you finetune a Large Language Model for general purpose instruction following?

Check out LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms!
Share
MosaicML
💥✨Will any man reject this toy? Now that's a boy's toy.
💖A great companion for camping, wilderness or other outdoor activities.
🎁It is also ideal for weddings, Father's Day, birthdays, anniversaries, Thanksgiving, Christmas.
Share
MosaicML
Defy gravity and illuminate your space with enchantment! 🌕✨ Elevate your ambiance with the mesmerizing glow of our Levitating Moon Lamp—where science meets magic. A celestial touch for a truly out-of-this-world experience.
Share
MosaicML
Unlock your pickleball potential with wall drills right at home!

Our magnetic DrillPickle Fiberglass Panels will transform your garage door into an awesome practice wall!

Don't miss your chance to save during Black Friday!
Share
MosaicML
RT @SalesforceVC: The best cloud companies are finding innovative ways to leverage data + AI to create meaningful experiences for their cus…
Share
MosaicML
RT @alighodsi: The founders of Databricks put together this strategy blog on where we think data platforms are headed in the future. We're…
Share
MosaicML
AI + Data + Governance FTW 🏆 Excited to contribute our #genAI expertise to the new @databricks Data Intelligence Platform!
-------------
From @Databricks:Meet the Data Intelligence Platform, our new AI-based data platform and fundamental shift in product strategy & roadmap. Democratize insights with natural language, build AI with your data – all with the best governance.

Our CEO @alighodsi shares more:
Share
MosaicML
Learn how CFOs can use data to partner with other departments and improve business results across the organization—from HR to supply chain management.
Share
MosaicML
A very enjoyable #podcast with @jefrankle, Chief Scientist (#neuralnetworks) with @databricks / @MosaicML with @jthandy (@getdbt) and @j_schottenstein (@LangChainAI)

Listen to the Analytics Engineering podcast with Jonathan now!
‎The Analytics ...
image
Share
MosaicML
🗓 This Sunday! VP of Engineering @hagay_lupesko joins the @BerkeleyHaas FinTech conference for a panel discussion on the future of #AI and #fintech — register today: LinkedIn
image
Share
MosaicML
RT @matei_zaharia: Not saying it’s the data, but congrats Rangers on an amazing season!
Share
MosaicML
Check out what the @databricks #LLM team has been building...a new MLflow version is available today!
-------------
From @Matei Zaharia:MLflow 2.8 is out today, with new support for LLM-based eval metrics among other features. Read about how we've been using it to improve our RAG apps at Databricks, like our docs assistant:
Share
MosaicML
Thanks for the share, @AMD! Our latest #LLM training hardware blog post from @abhi_venigalla is the perfect antidote to your Halloween candy hangover 🍫🎃👻 #GenAI
-------------
From @AMD:All treats, no tricks. 🎃🍬

MosaicML LLM tuning, Instinct MI250 multi-node scaling, AMD+Triton, and comparisons🧵
Share
MosaicML
Our NLP architect @abhi_venigalla continues his work on the use of AMD accelerators at scale for #LLM training. In our latest @databricks blog post, he shares multi-node training performance results on MI250 GPUs: Training LLMs a...
link
Training LLMs at Scale with AMD MI250 GPUs
Share
MosaicML
No more seed phrases troubles! 🚀
Experience 60-second setup & recovery.

📅 Preorder on Kickstarter now.
🎁 Exclusive Early Bird specials available!
Share
MosaicML
Thank you to @elibraginskiy from @MetaDialogAI / @CerebrateAI for sharing how his company used our platform to train a custom embedding model and a custom 7B parameter #LLM on Arabic and English text. Read our latest blog post to learn more: MetaDialog: Cus...
image
Share
MosaicML
MosaicML founder and chief scientist @jefrankle joins @spaniel_bashir on the latest episode of @gradientpub's podcast to talk about the lottery ticket hypothesis, training #llms at scale, and more.
-------------
From @The Gradient:Podcast link:
Share
MosaicML
Are you scaling a popular on-line game?👾👾

Dragonfly delivers 25X the performance at up to 50% the cost of Redis. Learn how Dragonfly helps the fastest growing games and media sites scale to deliver fast, reliable player experiences.
1
Share