With instruction fine-tuning, the LLaMA model shows very good performance in the general domain. But due to a lack of suitable data, little research has been done on the capabilities of LLaMA in the legal field. To fill this gap, we propose Lawyer LLaMA, a model additionally trained on legal domain data.
Lawyer LLaMA first carried out continual pretraining on a large-scale legal corpus, allowing it to systematically learn China's legal knowledge system. On this basis, with the help of ChatGPT, we collected a batch of analysis of the objective questions of the National Unified Legal Professional Qualification Examination (hereinafter referred to as the law test) and answers to legal consultations, and used the collected data to fine-tune the model, so that the model Acquire the ability to apply legal knowledge to specific situations.
Our model can:
Master Chinese legal knowledge : Able to correctly understand legal concepts in common fields such as civil law, criminal law, administrative law, and procedural law. For example, master the theory of crime constitution in criminal law, and be able to identify the criminal subject, crime object, criminal behavior, subjective psychological state and other crime constituent elements from the description of the facts of a criminal case. Using the learned legal concepts and theories, the model can better answer most of the questions in the law test.
Application in Chinese legal practice : be able to explain legal concepts in easy-to-understand language, and provide basic legal consultation, covering marriage, loan, maritime, criminal and other legal fields.
In order to contribute to the open research of the Chinese legal large model, this project will open source a series of instruction fine-tuning data in the legal field and the parameters of the Chinese legal large model based on LLaMA training .