Minecraft, as the world’s best-selling game, boasts over 238 million copies sold and more than 140 million peak monthly active users. Within the game, hundreds of millions of players have experienced a digital second life by surviving, exploring and creating, closely resembling the human world in many aspects. Minecraft acts as a microcosm of the real world. Developing an automated agent that can master all technical challenges in Minecraft is akin to creating an artificial intelligence capable of autonomously learning and mastering the entire real-world technology.

Ghost in the Minecraft (GITM) is a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents in Minecraft. GITM features the following characteristics:

  • Broad task coverage. All previous agents combined can only achieve 30% completion rate of all items in the Minecraft Overworld technology tree, while GITM is able to unlock 100% of them.
  • High success rate. GITM achieves 67.5% success rate on the “ObtainDiamond” task, improving the SOTA (OpenAI’s VPT) by +47.5%.
  • Excellent training efficiency. OpenAI’s VPT needs to be trained for 6,480 GPU days, DeepMind’s DreamerV3 needs to be trained for 17 GPU days, while our GITM does not need any GPUs and can be trained in 2 days using only a single CPU node with 32 CPU cores.

This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments.

GITM can handle various biomes, environments, day and night scenes, and even encounter monsters with ease.


