The magical LLM engine Lamini: GPT-3 can also be adjusted to ChatGPT
What took OpenAI months to complete can now be automated. Recently, large companies and developers are making large language models (LLMs), but some people think that the speed of applying large models is a bit slow. To build a practical AI tool, it is necessary to build a customized model based on the basic model and perform fine-tuning. This is a complex and time-consuming process, and for many people, debugging is not easy. But that problem now appears to have been resolved: On Saturday, a group of developers from Stanford released Lamini, which it claims can give every developer superpowers from GPT-3 to ChatGPT.
Lamini is an LLM engine that can accelerate the customization of basic models. Developers can use technologies from many companies and institutions, such as OpenAI, EleutherAI, Cerebras, Databricks, HuggingFace, Meta, etc., as long as they are open source.
Building a base model into a powerful language model is a challenging process that takes a lot of time and cost. First, the months-long iterative cycle of fine-tuning on a specific dataset takes a lot of time to diagnose why fine-tuning the model fails. Although prompt adjustment iterations are in seconds, the performance of the fine-tuned model can only be maintained for a few hours, and the amount of data that can be introduced into prompt is also very limited. OpenAI's machine learning team spent months fine-tuning its base model, GPT-3, and using RLHF, a reinforcement learning method based on human feedback, to build a powerful ChatGPT. This is computationally intensive and requires technical expertise from the team. After OpenAI released the ChatGPT API, many companies tried to use the fine-tuning API provided by it, but the effect was not satisfactory, and the performance of some fine-tuned basic models deteriorated and could not be put into use. Others say they can't get the most out of their data. Now, a new tool called "Lamini" solves these problems. Lamini encapsulates fine-tuning as a service, enabling developers to easily fine-tune GPT-3 into ChatGPT.
Simply put, Lamini provides a managed data generator that allows users to train their own large language models (LLMs) and their weights without using any GPU, just by executing a few lines of code in the Lamini library.
Lamini is an LLM engine that allows developers to train high-performance LLMs on large datasets with just a few lines of code. The Lamini library covers a wide variety of optimizations for machine learning models, from simple optimizations such as de-illusioning the model, to more challenging optimizations such as RLHF.
So, what role does the Lamini library play in building a powerful LLM like ChatGPT? According to OpenAI's process of building ChatGPT, Lamini's role specifically includes the following points:
Make prompt adjustments to ChatGPT or other models. The Lamini library provides fast tuning capabilities to switch between OpenAI's model and other open source models with just one line of code. The Lamini library also provides an optimized prompt that can be formatted differently according to the model. Build input and output datasets. This dataset gives the model an understanding of how to respond to its input. Using the Lamini engine, users can generate 50k data points from 100 data points with just a few lines of code, without booting any GPU. Lamini also provides a 50k open source dataset. Fine-tune the base model on the dataset. Lamini's research team fine-tuned an LLM on a 50k open-source dataset, and they will release the function and code to do it. Run RLHF on the fine-tuned model. With the Lamini library, users no longer need large teams of ML and human labelers to run RLHF. It is convenient for users to deploy the model to the cloud.
ChatGPT has gained global popularity because it can generate high-quality content based on user instructions, but its base model, GPT-3, has not always been able to do the same. For example, ask GPT-3 a question and it might generate another question instead of answering it. The reason is that ChatGPT uses a lot of "command-execution" data. However, for ordinary developers, this data is very difficult to obtain. Based on this problem, Lamini provides a hosted data generator capable of converting 100 samples into over 50k samples with just a few lines of code without firing up any GPU, and the generated data is also commercially available. Users can customize the initial 100 instructions, so as to generate 50,000 compliant instructions, and finally obtain a large instruction-following data set. Lamini's data generator is an LLM pipeline inspired by Stanford's open-source model Alpaca. This generation pipeline uses the Lamini library to define and invoke LLMs to generate different but similar command-response pairs.
Managed data generators produce data of varying quality, ranging from good to poor. Therefore, Lamini's next step is to follow a simple script to filter high-quality data to improve the quality of the data.
Next, Lamini uses the filtered high-quality dataset to fine-tune the base model to create a custom LLM for the user. Lamini encapsulates the fine-tuning model as a service, allowing developers to fine-tune the basic model into a specific model with excellent performance with only simple steps, which greatly reduces the technical threshold for building LLM. On social networks, Lamini has gained a lot of popularity.
Wonder if the presence of such tools will make it easier to tune large models?