HomeAI News
Orangutans learn to play "Minecraft", the method is actually the same as the GPT-4 intelligent body?

Orangutans learn to play "Minecraft", the method is actually the same as the GPT-4 intelligent body?

Hayo News
Hayo News
August 14th, 2023
View OriginalTranslated by Google
When orangutans learn to play "Minecraft", the method is actually the same as the method used by Nvidia scientists to train GPT-4 agents?

Note that this player is playing "Minecraft" skillfully, and ta is able to collect snacks and break blocks with ease.

As soon as the camera turned, we discovered that the player's real identity turned out to be an orangutan!

Yes, this is a non-human biological neural network experiment from the Ape Initiative.

The protagonist of the experiment, Kanzi, is a 42-year-old bonobo.

After training, it has learned various skills, challenged environments such as villages, desert temples, and portals in the lower realm, and cleared customs all the way to the end.

AI experts found that the process of teaching orangutan trainers to learn skills is similar to how humans teach AI to play Minecraft, such as contextual reinforcement learning, RLHF, imitation learning, course learning, etc.

When Orangutans Learn to Play Minecraft

Kanzi, a bonobo from the Ape Initiative, is one of the world's smartest orangutans, understands English and uses a touchscreen.

In the Ape Initiative, Kanzi has access to various electronic touch screens, which may have laid the foundation for him to quickly get started with "Minecraft".

When people first showed Kanzi Minecraft, it found the green arrow as soon as it sat down in front of the screen, and then swiped its finger towards the target.

learn three skills

In just a few seconds, Kanzi figured out how to move around in Minecraft.

Subsequently, it also learned to collect rewards.

Every time a reward is collected, it will be rewarded with snacks such as peanuts, grapes, and apples.

Kanzi's operation is getting more and more adept.

It recognizes obstacles that are the same green cylinder as the target arrow, and avoids them when collecting rewards.

Of course, Kanzi will also face difficulties. It needs to use the break tool to break large blocks, but this operation, it has never seen before.

As Kanzi gets stuck, a human comes to the side to help, pointing to the desired tool button. However, Kanzi still couldn't comprehend after reading it.

Humans had to do it themselves, breaking the blocks of wood with tools. Kanzi was thoughtful after watching it, and in the eyes of everyone's expectations, it also followed suit and smashed the wooden block after clicking the button. People burst into cheers instantly.

Now, Kanzi's skill tree is full of two things: collecting treats and breaking blocks.

While learning cave skills, the staff discovered that if he slipped off the wooden block he was trying to break, Kanzi would just walk away. Therefore, people have specially customized a task for it——

Smash blocks of wood in a cave full of diamond walls to prove its collection and smashing skills.

All was well in the cave, however, Kanzi had a problem: it got stuck in the corner. At this time, human beings are needed to lend a helping hand.

Eventually, Kanzi reached the bottom of the cave, smashing the last wall.

The crowd erupted in cheers, and Kanzi gave the staff a high-five.

fooled humans

Next, the interesting thing came: the staff invited a human player to play the game with Kanzi, of course, he did not know the identity of Kanzi.

The staff intends to see how long it takes for the player to realize that the person playing the game with him is not a human being.

At first, this little brother just felt that the speed of the opponent's movement was unbelievably slow,

When the picture of Kanzi was shown in front of his eyes, the little brother was frightened and recoiled.

get out of the maze

Playing "Minecraft" later, Kanzi fought harder and harder.

Whenever Kanzi collects a reward, people will applaud its behavior, and if it fails, the trainer will also encourage it to continue the game with clapping and cheering.

At this time, it has learned to unlock the map of the underground labyrinth:

Break down the obstacles in front of you:

Find the amethyst:

When Kanzi gets stuck, he goes out to relax and brings back a stick to put next to him.

Even in the unfortunate event of failure, Kanzi will click the button to regenerate himself.

The last level is a huge maze full of forks.

Because of the delay in getting out of the maze, Kanzi became agitated and started screaming with the branch, or broke the branch in anger.

In the end, it calmed down and continued to break through the level, and got out of the maze.

Immediately, applause and cheers surrounded Kanzi.

It seems that "My World" was played by Kanzi, a bonobo.

The Similarities Between Teaching Orangutans and Teaching AI

Watching a bonobo expertly play a video game can feel a little grotesque and uncanny.

Nvidia Senior Scientist Jim Fan commented on this -

Although Kanzi and its ancestors had never seen Minecraft in their lives, it quickly adapted to the textures and physics of Minecraft displayed on electronic screens.

And this is very different from the natural environment they have been exposed to and lived in. This level of generalization far exceeds the most powerful vision models to date.

The techniques for training animals to play Minecraft are essentially the same principles as for training artificial intelligence:

- Context-based reinforcement learning:

Whenever Kanzi reaches a marked milestone in the game, he is rewarded with a fruit or peanut, motivating him to keep following the rules in the game.


Kanzi doesn't understand human language, but he can see his trainers cheering him on and respond occasionally. Cheers from the training staff gave Kanzi a strong signal that he was on the right track.

- Imitation learning:

Once the trainer showed Kanzi how to complete the task, it immediately grasped the meaning of the operation. The effect of presentation goes far beyond the strategy of using rewards alone.

- Curriculum learning:

The trainer and Kanzi start with a very simple environment and gradually teach Kanzi the skills to control. Finally, Kanzi is able to traverse complex caves, mazes and nethers.

Not only that, even with similar training techniques, the animal's visual system can recognize and adapt to a new environment in a very short time, while the AI ​​​​vision model will take more time and training costs, and is often difficult to achieve. Ideal effect.

Once again we fall into the abyss of Moravec's paradox:

Artificial intelligence behaves inversely to human capabilities. In low-level intelligence activities that we think of as non-thinking or instinctual (such as perception and motor control), artificial intelligence is terrible. But in advanced intelligent activities that require reasoning and abstraction (such as logical reasoning and language understanding), artificial intelligence can easily surpass humans.

This corresponds exactly to the results presented in this experiment:

Our best AI (GPT-4) is close to human-level in understanding language, but far behind animals in perception, recognition.

Netizen: It turns out that orangutans will get angry when they play games

Both Kanzi and LLMs can play Minecraft, but there is a significant difference between the way Kanzi learns and LLMs, which we need to be aware of.

Faced with Kanzi's excellent learning ability, netizens began to spoof.

Some people predict that the world in 6 years will be a war for the planet of the apes...

Or an orangutan drinks Coke and integrates into human society...

Even Boss Ma was shot and turned into a "monkey version" of Musk.

It's also been said that Kanzi is the first non-human to have the wrath of a gamer, and ta's content.

"If Kanzi had his own gaming channel, I'd watch it honestly."

"There is not much difference between humans and bonobos when it comes to playing games. We are all motivated by rewards to perform certain tasks and complete goals, the only difference is the actual content of the rewards."

“In Minecraft, Kanzi’s rewards for mining diamonds are more immediate and raw (food), whereas our rewards for mining diamonds are more delayed and game-related. Anyway, kind of crazy.”

First, GPT learned to play "Minecraft", and now bonobos can also play, which makes people look forward to the future of using Neuralink.

Jim Fan teaches AI agents to play Minecraft

Humans have already accumulated a lot of advanced experience in teaching AI to play Minecraft.

As early as May of this year, the Jim Fan team connected Nvidia's AI agent to GPT-4 and made a brand new AI agent Voyager.

Voyager not only outperforms AutoGPT in performance, but can also perform lifelong learning in the game!

It can write code independently to dominate "Minecraft" without human intervention.

It can be said that after the appearance of Voyager, we are one step closer to general artificial intelligence AGI.

True digital life

After accessing GPT-4, Voyager does not need to worry about humans at all, and is completely self-taught.

It not only mastered the basic survival skills of digging, building houses, collecting, and hunting, but also learned to conduct open exploration by itself.

Driven by itself, it is constantly expanding its items and equipment, equipped with different levels of armor, using shields to block Shanghai, and using fences to house animals.

The emergence of large language models has brought new possibilities to the construction of embodied agents. Because LLM-based agents can leverage the world knowledge contained in pre-trained models to generate consistent action plans or executable strategies.

Jim Fan: We had this idea before BabyAGI/AutoGPT and spent a lot of time figuring out the best gradient-free architecture

The introduction of GPT-4 in the agent opens a new paradigm ("training" by code execution, rather than gradient descent), allowing the agent to get rid of the defect of being unable to learn for life.

OpenAI scientist Karpathy also praised it: this is a "gradient-free architecture" for advanced skills. Here, the LLM is equivalent to the prefrontal cortex, with the lower-level mineflayer API generated through code.

3 key components

In order to make Voyager an effective lifelong learning agent, teams from Nvidia, Caltech and other institutions proposed 3 key components:

1. An iterative hinting mechanism that incorporates game feedback, execution errors, and self-validation to improve programs

2. A skill code base for storing and retrieving complex behaviors

3. An automated tutorial that maximizes the agent's exploration

First, Voyager will try to use a popular Minecraft JavaScript API (Mineflayer) to write a program to achieve a specific goal.

Game environment feedback and JavaScript execution errors (if any) will help GPT-4 improve the program.

Left: Environmental feedback. GPT-4 realizes that it needs 2 more planks before making sticks. Right: execution error. GPT-4 realized that it should make a wooden axe, not an "Acacia" ax, because there are no "Acacia" axes in Minecraft.

By providing the agent's current state and task, GPT-4 tells the program whether it has completed the task.

In addition, if the task fails, GPT-4 will also provide criticism and suggest how to complete the task.

self verification

Second, Voyager gradually builds up a skill bank by storing successful procedures in a vector database. Each program can be retrieved by embedding its docstring.

Complex skills are synthesized by combining simpler skills, which allows Voyager's abilities to grow rapidly over time and mitigate catastrophic amnesia.

Above: Add skills. Each skill is indexed by an embedding of its description, which can be retrieved in similar situations in the future. Bottom: Retrieval skills. When faced with a new task proposed by the automated curriculum, a query is made and the top 5 relevant skills are identified.

Third, an automatic curriculum proposes suitable exploration tasks based on the agent's current skill level and world state.

For example, if it finds itself in a desert instead of a forest, learn to gather sand and cacti instead of iron. Lessons are generated by GPT-4 based on the goal of "discovering as diverse as possible".

automatic course

As the first LLM-driven embodied agent that can learn for life, the similarities between Voyager's training process and the orangutan training process can give us a lot of inspiration.



Reprinted from 新智元View Original


no dataCoffee time! Feel free to comment