We are proud to present StableVicuna, the first large-scale open source chatbot trained via reinforced learning from human feedback (RHLF). StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model. For the interested reader, you can find more about Vicuna here.
Here are some of the examples with our Chatbot,
Similarly, here are a number of benchmarks showing the overall performance of StableVicuna compared to other similarly sized open source chatbots.
In order to achieve StableVicuna’s strong performance, we utilize Vicuna as the base model and follow the typical three-stage RLHF pipeline outlined by Steinnon et al . and Ouyang et al . Concretely, we further train the base Vicuna model with supervised finetuning (SFT) using a mixture of three datasets:
We use trlx to train a reward model that is first initialized from our further SFT model on the following RLHF preference datasets:
Finally, we use trlX to perform Proximal Policy Optimization (PPO) reinforcement learning to perform RLHF training of the SFT model to arrive at StableVicuna!
Obtaining StableVicuna-13B
StableVicuna is of course on the HuggingFace Hub! The model is downloadable as a weight delta against the original LLaMA model. To obtain StableVicuna-13B, you can download the weight delta from here. However, please note that you also need to have access to the original LLaMA model, which requires you to apply for LLaMA weights separately using the link provided in the GitHub repo or here. Once you have both the weight delta and the LLaMA weights, you can use a script provided in the GitHub repo to combine them and obtain StableVicuna-13B.
Announcing Our Upcoming Chatbot Interface
Alongside our chatbot, we are excited to preview our upcoming chat interface which is in the final stages of development. The following screenshots offer a glimpse of what users can expect.
Our Commitment to Continuous Improvement
This is just the beginning for StableVicuna! Over the coming weeks, we will be iterating on this chatbot and deploying a Discord bot to the Stable Foundation server. We encourage you to try StableVicuna and provide us with valuable feedback to help us improve the user experience. For the time being, you can try the model on a HuggingFace space by visiting this link.
Goose on!
Acknowledgments
Big thanks to Duy Phung who trained the StableVicuna model. We’d also like to extend our gratitude to our open-source contributors who have played a crucial role in bringing this project to life.
Visit Official Website