HomeAI Tools


Stability AI
22 liked
About StableVicuna

We are proud to present StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RHLF). StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model. For the interested reader, you can find more about Vicuna here.

Here are some of the examples with our Chatbot,

  1. Ask it to do basic math
  2. Ask it to write code
  3. Ask it to help you with grammar

example 1.png

example 2.png

example 3.png

Similarly, here are a number of benchmarks showing the overall performance of StableVicuna compared to other similarly sized open source chatbots.


In order to achieve StableVicuna’s strong performance, we utilize Vicuna as the base model and follow the typical three-stage RLHF pipeline outlined by Steinnon et al . and Ouyang et al . Concretely, we further train the base Vicuna model with supervised finetuning (SFT) using a mixture of three datasets:

  • OpenAssistant Conversations Dataset (OASST1), a human-generated, human-annotated assistant-style conversation corpus comprising 161,443 messages distributed across 66,497 conversation trees, in 35 different languages;
  • GPT4All Prompt Generations, a dataset of 437,605 prompts and responses generated by GPT-3.5 Turbo;
  • And Alpaca, a dataset of 52,000 instructions and demonstrations generated by OpenAI’s text-davinci-003 engine.

We use trlx to train a reward model that is first initialized from our further SFT model on the following RLHF preference datasets:

  • OpenAssistant Conversations Dataset (OASST1) contains 7213 preferences samples;
  • Anthropic HH-RLHF, a dataset of preferences about AI assistant helpfulness and harmlessness containing 160,800 human labels;
  • And Stanford Human Preferences (SHP), a dataset of 348,718 collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to philosophy.

Finally, we use trlX to perform Proximal Policy Optimization (PPO) reinforcement learning to perform RLHF training of the SFT model to arrive at StableVicuna!

Obtaining StableVicuna-13B

StableVicuna is of course on the HuggingFace Hub! The model is downloadable as a weight delta against the original LLaMA model. To obtain StableVicuna-13B, you can download the weight delta from here. However, please note that you also need to have access to the original LLaMA model, which requires you to apply for LLaMA weights separately using the link provided in the GitHub repo or here. Once you have both the weight delta and the LLaMA weights, you can use a script provided in the GitHub repo to combine them and obtain StableVicuna-13B.

Announcing Our Upcoming Chatbot Interface

Alongside our chatbot, we are excited to preview our upcoming chat interface which is in the final stages of development. The following screenshots offer a glimpse of what users can expect.



Our Commitment to Continuous Improvement

This is just the beginning for StableVicuna! Over the coming weeks, we will be iterating on this chatbot and deploying a Discord bot to the Stable Foundation server. We encourage you to try StableVicuna and provide us with valuable feedback to help us improve the user experience. For the time being, you can try the model on a HuggingFace space by visiting this link.

Goose on!


Big thanks to Duy Phung who trained the StableVicuna model. We’d also like to extend our gratitude to our open-source contributors who have played a crucial role in bringing this project to life.

  • Philwee who assisted in the evaluation of StableVicuna.
  • The OpenAssistant team for providing us with early access to their RLHF dataset.
  • Jonathan from CarperAI for working on the Gradio demo.
  • Poli and AK from Hugging Face for helping us set up the Gradio demo in Hugging Face.

Visit Official Website


Community Posts
no data
Nothing to display