HuggingFace + ChatGPT: The strongest AI combination - HuggingGPT is here
HuggingGPT is finally here! This AI achievement, a collaboration between Zhejiang University and Microsoft Asia Research Institute, became popular as soon as it was released.
Just describe your task requirements in natural language , such as "which animals are in this picture, how many of each type", HuggingGPT can automatically analyze which AI models are needed and call the corresponding HuggingFace model to complete the task.

Some friends exclaimed that HuggingGPT is a versatile "GPT switcher", but in the eyes of NVIDIA's AI scientists, it is more like the beginning of the "Everything APP" vision and a key step towards AGI (Artificial General Intelligence).
A key step towards AGI is to solve complex AI tasks in different fields and modes, but the existing models can only complete specific tasks.
The author of the HuggingGPT paper believes that the large language model LLM can be used as an intermediate controller to manage all existing AI models, and solve complex AI tasks by "mobilizing and combining everyone's strength", and language can be used as a general interface.
Based on this idea, HuggingGPT was born, and its engineering process is divided into four steps: task planning, model selection, task execution and output results .
ChatGPT parses the user's needs into a task list , and determines the execution sequence and resource dependencies between tasks, and then assigns the appropriate model to the task according to the description of each expert model hosted on HuggingFace , mixing the selected experts on the endpoint The model executes the assigned tasks according to the task order and dependencies , and sends the execution information and results to ChatGPT. Finally, ChatGPT summarizes the execution process logs and inference results of each model, and gives the final output .
In the paper, the author assumes such a request:
Please generate a picture of a girl reading a book in the same pose as the boy in example.jpg. Please describe the new picture with your voice.
Through this example, we can see how HuggingGPT splits the task into 6 subtasks and selects the appropriate model to execute to get the final result.

In the actual test, the author uses two variants of gpt-3.5-turbo and text-davinci-003 to test. HuggingGPT performs well in tasks with resource dependencies, and can correctly parse specific tasks, such as completing image conversion.

In the audio and video tasks, it also demonstrated the ability to organize cooperation between models, by executing two models in parallel and serially, to complete a "astronaut walking in space" video and voice work.

In addition, it can integrate input resources from multiple users to perform simple reasoning tasks such as counting the number of zebras in a picture.

To sum up, HuggingGPT can perform well on various forms of complex tasks.
Actually, it's not called "HuggingGPT"
HuggingGPT is a project under construction, part of the code is open source, and it has already obtained 1.4k stars. Interestingly, its project name is not HuggingGPT, but the AI butler JARVIS in Iron Man.

This project is very similar to the Visual ChatGPT released in March, and both were carried out by researchers from Microsoft Asia Research Institute.

Finally, the advent of this tool has aroused the excitement of netizens. Some people say that ChatGPT has become the commander-in-chief of all AIs created by humans. Some people think that AGI may not be an LLM, but multiple interconnected ones connected by a "middleman" LLM model.
And does this mean that we have entered the era of "semi-AGI"?