Harbin Institute of Technology professor analyzes ChatGPT Che Wanxiang's graphic explanation of the hottest AI
On March 21, at the ChatGPT and Large-scale Model Technology Conference held by the Heart of the Machine, Che Wanxiang, a ministerial professor of computing at Harbin Institute of Technology and a doctoral supervisor, delivered a keynote speech "Analysis of ChatGPT". In the speech, he answered What scientific problem did ChatGPT solve, how did it solve it, and what problems still need to be solved in the future. In addition, we also learned that Professor Che Wanxiang's scientific research results related to large models are also undergoing industrial transformation.

The following is the content of Che Wanxiang’s speech at the Heart of the Machine AI Technology Annual Conference. We have edited and organized it without changing the original intention:
Hello everyone, I am Che Wanxiang from Harbin Institute of Technology, thank you very much for the invitation of the heart of the machine, the title of my report is "Analysis of ChatGPT" . The reason why it is called a shallow analysis is because we really do not know more details about ChatGPT, just guess the technology behind it based on some published papers.

natural language processing
ChatGPT is a recent advance in the research direction of natural language processing. First of all, what is natural language processing? Natural language refers to human language, specifically text symbols rather than speech signals. Natural language processing is a variety of theories and methods that allow computers to understand and generate natural language. Of course, the traditional, or long ago, natural language processing is equivalent to natural language understanding, because natural language generation was too difficult at that time, and it could only be generated by some template methods. But now we see that with the advancement of AIGC and other technologies, generative technology has become a mainstream direction of natural language processing. ChatGPT itself is a generative model, which is also the latest development in natural language processing.
In fact, it is still difficult for machines to understand natural language, because from the perspective of human intelligence, natural language processing belongs to cognitive intelligence, which requires stronger abstraction and reasoning capabilities.

Natural language processing faces many difficulties. For example, as shown in the figure below, in this dialogue, the content contains many "meanings", and different "meanings" represent different meanings. This situation is a typical ambiguity problem. In addition to ambiguity, the difficulties faced by natural language processing also include abstraction, composition, and evolution. In the question of abstraction, we take the word car as an example, which has very rich meanings behind it. When we talk about the word car, we will have many associations; the same is true for compositionality, no matter what language it is Composed of some basic symbols, these basic symbols can be combined into endless semantics.

It is precisely because of these difficulties that natural language processing has become a bottleneck that restricts greater breakthroughs and wider applications of artificial intelligence. Many scholars, including many Turing Award winners, proposed long ago that natural language processing will be artificial It is an important direction for the future development of intelligence, so natural language processing is also known as "the jewel in the crown of artificial intelligence". We found that many recent advances in artificial intelligence are inseparable from natural language processing, such as the famous Transformer, which was first used to solve machine translation problems, and later BERT and ChatGPT. related to language processing. Therefore, it is not too much to call natural language processing the jewel in the crown of artificial intelligence.

Traditional natural language processing can be divided into four directions: resource construction at the bottom layer; basic research at the middle layer, including word segmentation, part-of-speech tagging, etc.; followed by applied technology research, including information extraction, machine translation, question answering system, etc.; It is an application system, including education, medical care, etc.
But why is it traditional natural language processing? These two words were added three months ago. First of all, many basic research tasks, such as word segmentation and part-of-speech tagging, have already been included in the large model, so there is no need for these basic tasks to exist. Secondly, ChatGPT is not aimed at a model or a single task. It unifies all application tasks, so the traditional way of dividing tasks will be challenged. Therefore, it may be necessary to reshuffle the cards for the entire natural language processing.

The development history of natural language processing and the development history of artificial intelligence are almost synchronized, starting from the use of small-scale expert knowledge in the 1950s; shallow machine learning algorithms appeared in 1990; deep learning algorithms in 2010; and then pre-training in 2018 The paradigm represented by the model appears; by 2023, the emergence of models such as ChatGPT will experience five paradigm changes.

pre-trained language model
Whether it is ChatGPT or BERT, there are pre-trained models behind it. What is a pre-trained model? In the traditional machine learning method, some data is sampled from some unlabeled data, and then a model is trained by manual labeling. But with the increase of unlabeled data, humans have no way to label one by one, what should we do? Then the pre-training model came out. Some people call it the unguided method. In fact, the more accurate name is the self-guided or self-supervised method. After all, this method takes advantage of the sequential nature of the language itself.
After having the pre-trained model, we fine-tune it on the target task, so that we can obtain a better model for the target task. The traditional method trains different models according to different tasks. If there are many tasks, many models need to be fine-tuned, which brings great trouble to model training and application.

The more representative research in the pre-training model stage is GPT-3, which is a large model released by OpenAI and Microsoft in 2020, with a parameter volume of 175 billion. From the perspective of the time, the researchers thought that this model was too big. The method is fine-tuned, so the "prompt" method appeared. The so-called prompt is to directly give the task description, so that the model can automatically complete the task. This completion process is to complete the task. If some examples are given, the model performance may be better. This is also called situational learning. One advantage of this approach is that the model can complete different text generation tasks without retraining for a certain task. Of course, this text generation task is in quotation marks, because it can not only answer questions and continue writing articles, but also complete the task of generating web pages and even generating code.

The emergence of GPT-3 has not attracted everyone's special attention, why? Because at that time everyone found that although GPT-3 can achieve these tasks, the effect is not very good. To give a few typical examples, for example, if you ask GPT-3 "which is heavier, the oven or the pencil", it will say "the pencil is heavier"; if you ask "how many eyes are there on my feet", it will say "two eyes" .
Many of the answers given by GPT-3 are wrong. Some people think that spending so much money to construct such a large model does not solve the fundamental task. Of course, the original text of GPT-3 also pointed out that it is 4.1% lower than the model with knowledge reasoning ability proposed by Mr. Ding Xiao and others in our group on the task of choosing the end of the story. Therefore, everyone thinks that this kind of large model has poor robustness, weak interpretability, and weak reasoning ability, so more knowledge is needed.

Due to the existence of these problems, academia has taken two paths:
The first way: Since the model lacks knowledge, reasoning ability, and interpretability, it needs to be specially optimized for these tasks. Can we add knowledge, such as adding a knowledge graph to it, specifically for interpretability? This is a way.
The second way: Represented by OpenAI, it does not solve specific tasks, but keeps adding data to it, as much data as there are manual workers, just like the current ChatGPT route, which keeps adding data to it, even is manually labeled data.
Now it seems that this method of adding knowledge graphs does not seem to have made any significant progress. Instead, it is a method of making great efforts to achieve better progress. For example, if you still ask the two questions just now, ChatGPT will say "the oven is heavier" and give an explanation; if you ask "a few eyes on the feet", it will also deny that "there are no eyes on the feet" and give explain. It can be said that this problem has been solved very well.

Of course, there are other examples, such as asking ChatGPT to write an academic conference speech in Tibetan, it will first deny it, saying that it cannot. This is very powerful. It turns out that it is difficult for us to deny a question when we do Q&A and chat models. Plus, it says it can write in English, and if I say yes, it really does. It can be seen that ChatGPT has a good grasp of the language.
Still in the previous example, the "dialogue between the leader and dumb" directly asked ChatGPT what it meant, and it would reply that there were two meanings. The model explained clearly what each sentence meant. This is very amazing. It may not be accurate if you look closely, but the model at least understands the problem, which is very difficult to do.

So what exactly is ChatGPT? In fact, the name is a bit misleading. When it comes to Chat, many people will think of a chat system. In fact, it is not a chat system in essence. It is just a chat in disguise. It is essentially a "conversational general artificial intelligence tool." Of course, with such a powerful tool, different scholars have different opinions. Bill Gates and Huang Renxun think that this invention is great. It can benefit mankind by comparing it to PC Internet and iPhone technologies. Of course, there is another school Represented by Musk, they believe that the emergence of general artificial intelligence like ChatGPT will threaten human beings; there are also some rationalists, like Turing Award winner Yann LeCun, who once said that in terms of underlying technology, ChatGPT is not so great He said that although it is revolutionary in the eyes of the public, we know that it is a well-combined product, nothing more.

What everyone sees is more superficial phenomena, saying how amazing its effect is, but a stunning thing may not cause such a big sensation. What kind of changes have occurred behind it or what substantive scientific problems have really been solved?
I think the main solution is the fundamental revolution of knowledge representation and knowledge transfer. It can be said that every change in the way of knowledge representation and transfer will cause huge changes in the industry. Knowledge was first stored in the computer in the form of a database. To call it, you need SQL statements, etc., and you need people to adapt to the machine. Even this kind of technology has produced many great companies, such as Oracle. But then a large amount of knowledge is stored on the Internet. This kind of knowledge is stored unstructured, including text, images, and even videos. If we want to use the knowledge in it, we don't need to learn SQL statements, we just need to use keywords to call out the knowledge stored in the Internet through search engines. Now ChatGPT will still store Internet knowledge, but it is not stored in a displayed way, but stored in a large model in a parameterized way.
But GPT-3 was able to complete such a task two years ago, why didn't it cause such a big sensation? Because it does not solve another part of the problem, that is, how to use this knowledge. ChatGPT solves this problem very well, and this knowledge can be called out very well through natural language. ChatGPT is equivalent to opening up these two parts. Once it is opened up, it will cause a violent revolution. I believe it will also produce great companies like the previous two. Now OpenAI is one step ahead, or it can be said to be the first, but it is not clear whether it can have the last laugh. Definitely, because now many companies are actively following up. Of course, there is another interesting discovery, that is, each of the representative companies behind is inseparable from Microsoft, and each of them has Microsoft, but Microsoft is not the boss every time, and they all follow behind. This is also a very interesting phenomenon. .

The development history of ChatGPT is also very inspirational. The first generation of GPT was proposed by OpenAI, even earlier than BERT. GPT opened the era of natural language processing pre-training. But everyone remembers more about BERT, because OpenAI was still a small company at that time, and everyone didn't pay much attention to its work. At the same time, BERT was proposed by Google. From the perspective of natural language understanding, BERT has a large number of parameters and has Two-way understanding, so it works better than GPT. But OpenAI did not imitate this way to do two-way, it continued to follow the one-way structure of GPT, and later produced GPT-2, which was used more in academia. The appearance of GPT-3 became popular for a while, but after that everyone I feel that this model is a waste of money, and the effect is not very good. InstructGPT appeared in March last year and attracted a lot of attention from the international academic community, but relatively little domestic attention.
Until the release of ChatGPT at the end of November last year, it was an instant hit and attracted more attention, and GPT-4 released in March this year, which not only processes text, but even integrates multimodality. The whole process of OpenAI is quite inspirational. It has been following the route of GPT, and finally got through. Some people say that OpenAI is more stubborn and stubborn, but it does have its own confidence and ideals, and it has succeeded.

From GPT-3 to ChatGPT, it is not accomplished overnight, and a lot of work has been done in the middle. What is more interesting is the proposal of CodeX. This model simply does code pre-training and code completion. With it, it can help us write code in the code editor. This tool is very easy to use. At this time, GPT is equivalent to differentiation, part of it is language, part of it is code, and the middle code-davinci-002 merges the two, and continues to use code data for pre-training on the basis of the language model, which has produced very good results. Reasoning effect, why is this? There may be good logic in the code, sequential logic in solving problems, and even remote dependencies. Of course, there are many explanations in it. This is just one of the guesses. Now everyone does not understand why ChatGPT has such a good effect.

In summary, ChatGPT has about three core technologies:
1. There must be a large-scale pre-training model. How big is the model? Now there is no clear definition and concept. Just like the concept of big data, some people think that more than 10 billion parameters are almost enough, but more than 60 billion parameters may be needed to develop the reasoning ability of the model.
2. Instruction Tuning. Instead of fine-tuning one task at a time, all tasks are unified into Instruction instructions and corresponding answers. In fact, it has returned to the idea of guided learning, but it incorporates more tasks. The advantage of this is that these tasks can help each other. In addition, it can also achieve the effect of task generalization. For some new tasks that have not been seen before, you can deal with new tasks by learning the tasks you have seen before, and play the role of task combination. Effect. For example, I want to do a cross-language summarization task. I have never seen this task, but the model has seen the machine translation task and the summarization task. Let it do this new cross-language summarization task, and it may do well. Zero-shot capability is very strong. In fact, this is also necessary for strong artificial intelligence, otherwise one task after another is trained, and it will return to the original weak artificial intelligence method.
3. Now everyone is paying more attention to reinforcement learning based on human feedback. From the perspective of raising the upper limit of the model, it may not be the main purpose. It is to improve the diversity and safety of the model-generated results. Of course, this method also brings a benefit. As the model goes online, more and more human feedback can be collected, and human feedback can better help train the model.

ChatGPT has brought great impact to the field of natural language processing. Now more and more resources, including computing resources, data resources, and user resources, are mastered by the industry, so it is easier to do system-level innovation.

The difficulties facing academia will become greater and greater, because we don't have so much data and we don't have so many computing resources. Of course, it does not mean that there is nothing to do in academia. What should we do in the future? Maybe you still have to go down, but if you go down, you will face a choice. Do you go down the road of the large model? Some people in the academic circle think that this is not a good route.
Not only natural language processing, but in terms of the development process of artificial intelligence, two obvious trends can be seen:
First, the phenomenon of model homogeneity is becoming more and more serious. In the past, different models were used according to different tasks and fields. Now Transformer is used uniformly. I think this trend cannot be reversed. Even if a Transformer substitute comes out, it must be a homogeneous model.

Second, the model size is getting larger and larger. There is a lot of evidence that intelligence emerges as models get larger and larger. It is difficult for us to make the model smaller and make it have good generality. Of course, specific industry applications still require small models, but to achieve general artificial intelligence, the model may still need to be large enough.

It is precisely because of these two trends that even in academia, we have to embrace large models, which are not based on personal will. Since hugging, how to hug? There are many ways to go, mainly from the following three aspects:
1. Make up for the shortcomings of the large model, check for gaps and make up for omissions, and make up for where the model is not doing well;
2. Explore the mechanism of the large model. Now many tasks are still in the experimental stage. As for why the model produces such an effect, we don’t know. We need to know the mechanism behind it;
3. Promote the application of large models.

What are the disadvantages of large models? Although ChatGPT works amazingly, it is not perfect and has many shortcomings, including insufficient factual consistency and insufficient logical consistency.

But how to make up for it? An enhanced method can be used. Turing Award winner Yann LeCun summarized this method in an article published, including adding search engines, knowledge bases, plug-in tools, etc., all of which can be called enhancements. In addition, there are currently many jobs that use search engines to make up for the lack of ChatGPT.

The next step is to explore the mechanism behind the large model. The current controversy revolves around whether the Encoder-Decoder structure is better or the Decoder only structure is better. Each of these methods has its own advantages and disadvantages. Decoder only, such as GPT, has higher parameter and data utilization, but from the perspective of input understanding, the Encoder-Decoder structure may be better. There is no unified conclusion on how to balance the two, or which one is better, and it is still in the exploratory stage.

There is also how to evaluate the large model. Now there are many data sets for evaluation models released, but once this data set is released, it may be leaked. Some people will use the data set in the training data. How to solve this problem also needs to be considered.
The third is to explain the mechanism of emergent phenomena and CoT.

Finally, the application of large models should be promoted. ChatGPT is a general model. How to apply it to all walks of life, including how to customize, miniaturize, personalize, and even characterize, security, privacy, etc., are issues that need to be considered and resolved.

How far will ChatGPT go, I will introduce it with the previous trend chart. As can be seen from the figure, the course of a technology paradigm is about half of the previous course. For example, we have worked on expert knowledge for 40 years, shallow machine learning algorithms for 20 years, deep learning for 10 years, and pre-training models for 5 years. How many more years can ChatGPT do? According to the existing trend, it may be estimated that it may be 2.5 years, that is, it may be replaced again by 2025. But what will happen if it develops like this?
Some people will say that this prediction is wrong. If we follow this prediction, technology may stop improving one day. I think this may also happen, because with the development of artificial intelligence, it may threaten human beings. Once artificial intelligence threatens the survival of human beings, the progress of artificial intelligence technology may be banned by legislation.

How will artificial intelligence go in the future? Now ChatGPT has solved the problem of reasoning better. In the future, we may have to solve the problem of pragmatics. The meaning of the expression is different.

Of course, this problem cannot be solved by only starting with the text, and it is still necessary to develop into multi-modality. Combine more modalities, leading to true AGI.
Previously, some scholars divided the range of data that machines can use into five ranges, from the simplest small-scale text to interaction with human society. For a long time before, everyone only used the text terminal. At this stage, it is equivalent to stepping over the middle two (multimodal and embodied), and directly interacting with human society, because now ChatGPT is interacting with human society. In the process of interaction, people are also teaching the machine how to speak and understand the language. But crossing the middle two paragraphs does not mean that these two paragraphs are really included, and these two paragraphs must be completed. Now GPT-4 has completed multi-modality, and Google, Microsoft, etc. are also doing research on embodiment.

Summary and Outlook
Finally, a summary and outlook. Natural language processing is the crown jewel of artificial intelligence. ChatGPT is a new generation of knowledge representation and calling after databases and search engines. The trend of model homogeneity and scale is irreversible. To truly realize AGI, a combination of multimodality and embodied intelligence is required.

The above is the whole content of my report, thank you!