Explosive AI papers create the prototype of "Western World": 25 AI agents grow freely in the virtual town
The game of "Western World" is gradually coming into reality.
Can we create a world? In that world, robots can live, work, and socialize like humans, replicating all aspects of human society.
This kind of imagination has been perfectly restored in the setting of the film and television work "Western World": many robots with pre-installed storylines are put into a theme park. They can act like humans and remember what they see. Things, people you meet, things you say. Every day, the bots are reset to return to their core storyline.
Stills from "Westworld", the figure on the left is a robot with a pre-installed storyline.
Let’s expand our imagination: Today, if we want to turn a large language model like ChatGPT into the master of the western world, what will we do?
In a recently popular paper, the researchers successfully built a "virtual town" in which 25 AI agents survived. They can not only engage in complex behaviors (such as holding Valentine's Day parties), And these actions are more realistic than human role-playing.
- Paper link: https://arxiv.org/pdf/2304.03442v1.pdf
- Demo address: https://reverie.herokuapp.com/arXiv_Demo/
From sandbox games like The Sims to applications such as cognitive models and virtual environments, researchers have envisioned creating agents capable of believable human behavior for more than four decades. In these scenarios, computationally driven agents behave in a way that is consistent with their past experience and respond plausibly to their environment. This simulation of human behavior can populate virtual spaces and communities with real social phenomena, train "people" to deal with rare but difficult interpersonal relationships, test social science theories, make models of human processors for theory and usability testing, provide ubiquitous computing Applied and social robot dynamics can also provide the basis for NPC characters that navigate complex human relationships in the Open World.
But the space for human behavior is vast and complex. While large language models can simulate believable human behavior at a single point in time, to ensure long-term consistency, a general agent needs an architecture to manage growing memories as new interactions, conflicts, and events evolve over time. And emergence and disappearance, while also dealing with cascading social dynamics unfolding among multiple agents.
If a method can retrieve relevant events and interactions over a long period of time, reflect on these memories, generalize and draw higher-level inferences, and apply this reasoning to create information about current and long-term agent behavior. Meaningful planning and reaction, then the realization of dreams is not far away.
This new paper introduces "Generative Agents," agents that use generative models to simulate plausible human behavior, and demonstrates that they can produce plausible simulations of individual and emergent group behavior:
- Ability to make broad inferences about itself, other agents, and the environment;
- Ability to create daily plans that reflect one's own characteristics and experiences, execute those plans, respond to them, and re-plan when appropriate;
- Be able to react when the end user changes the environment or commands them in natural language.
Behind "Generative Agents" is a new agent architecture capable of storing, synthesizing, and applying relevant memories, using large language models to generate believable behaviors.
For example, "Generative Agents" will turn off the stove if they see their breakfast burning; they will wait outside if the bathroom is occupied; A society full of "Generative Agents" is marked by emerging social dynamics in which new relationships are formed, information is diffused, and coordination among agents arises.
Specifically, the researchers announced several important details in this paper:
- Generative Agents, which are trusted simulations of human behavior that dynamically adjust conditioned on the agent's changing experience and environment;
- A novel architecture that makes it possible for Generative Agents to remember, retrieve, reflect, interact with other agents, and plan through dynamically evolving environments. The architecture exploits the powerful prompt capabilities of large language models and complements these capabilities to support long-term consistency of agents, manage dynamically evolving memories, and recursively generate more generations;
- Two evaluations (a comparative evaluation and an end-to-end evaluation) to determine the causality of the importance of each component of the architecture, and to identify failures e.g. due to improper memory retrieval;
- The opportunities and ethical and social risks of Generative Agents in interactive systems are discussed. The researchers argue that these agents should be tuned to mitigate the risk of users forming parasitic social relationships, documented to mitigate the risks posed by deepfakes and custom persuasion, and designed to complement rather than replace human stakeholder relationships. way of application.
Once the article was published, it caused heated discussions across the Internet. Karpathy, who was optimistic about the direction of "AutoGPT", praised again and again, thinking that "Generative Agents" is not a little bit higher than the previous concept of "Open World":
Some researchers assert that the release of this research means that "large-scale language models have achieved a new milestone":
"Generative Agents" Behavior and Interactions
To make "Generative Agents" more concrete, the study instantiated them as characters in a sandbox world.
25 agents live in a small town called Smallville, each represented by a simple avatar. All roles can:
- communicate with others and the environment;
- remember and recall what they did and observed;
- Reflect on these observations;
- Make a daily plan.
The researchers describe the identity of each agent in natural language, including their occupation and relationship with other agents, and use this information as a seed memory. For example, the agent John Lin has the following description (this article intercepts a paragraph):
"John Lin is a pharmacy owner who loves to help. He is always looking for ways to make medicines more accessible to customers. John Lin's wife is a college professor, Mei Lin, and they live with their son, Eddy Lin, who studies music theory; Lin loves his family very much; John Lin has known the elderly couple next door, Sam Moore and Jennifer Moore, for years..."
After the identity is set, the next step is how the agent interacts with the world.
In each step of the sandbox, the agents output a natural language statement to describe their current actions, such as the statement "Isabella Rodriguez is writing a diary", "Isabella Rodriguez is checking email", etc. These natural languages are then translated into specific actions that affect the sandbox world. Actions are displayed on the sandbox interface as a set of emoji that provide an abstract representation of the action.
To achieve this, the study employs a language model that translates actions into a set of emoji that appear in a dialog above each agent's avatar. For example, "Isabella Rodriguez is writing in her diary" appears as
, "Isabella Rodriguez is checking mail" appears as
. Additionally, a full natural language description can be accessed by clicking on the agent avatar.
The agents communicate with each other in natural language. If the agents realize that there are other agents around them, they will consider whether to go over and chat. For example, Isabella Rodriguez and Tom Moreno had a conversation about the upcoming election:
In addition, the user can also specify what role the agent will play. For example, if one of the agents is designated as a reporter, you can consult the agent about news.
The small town of Smallville has many public scenes including cafes, pubs, parks, schools, boarding houses, houses and shops. In addition, each public scene also includes its own functions and objects, such as a kitchen in the house and a stove in the kitchen (Figure 2). In the agent's living space there are also beds, tables, wardrobes, shelves, as well as bathrooms and kitchens.
Agents can move around Smallville, enter or leave a building, navigate forward, and even approach another agent. The agent's movement is controlled by Generative Agents' architecture and sandbox game engine: when the model instructs the agent to move to a certain location, the study computes its walking path in the Smallville environment to reach the destination, and the agent begins to move.
In addition, the user and the agent can also affect the state of other objects in the environment, for example, the bed is occupied when the agent is sleeping, and the refrigerator may be empty when the agent is done with breakfast. End users can also rewrite the agent's environment through natural language. For example a user sets the shower status to leaking when Isabella enters the bathroom, after which Isabella will find tools from the living room and try to fix the leak.
A day in the life of an agent
Starting with a description, the agent begins planning its day. As time goes by in the sandbox world, the behavior of the agents gradually changes as they interact with each other and the world, as well as the memories they build themselves. The picture below shows the behavior of the pharmacy owner John Lin in a day.
In this family, John Lin was the first to wake up at 7:00 a.m., brush his teeth, shower, get dressed, eat breakfast, and then read the news at the living room dining table. At 8 o'clock in the morning, John Lin's son Eddy also got up to prepare for class. When he was leaving the house, he had a conversation with John, which read:
Shortly after Eddy left, his mother Mei also woke up, Mei asked about her son, John recalled their conversation just now, and then had the following conversation
In addition to this, "Generative Agents" also show the emergence of social behavior. By interacting with each other, "Generative Agents" exchange information and form new relationships in the Smallville environment. These social behaviors are naturally occurring, not predetermined. For example, when the agent notices the presence of the other party, a dialogue may be held, and the dialogue information can be propagated between the agents.
Let's look at a few examples:
Information dissemination. When the agents notice each other, they may engage in a conversation. When doing so, information can be propagated from one agent to another. For example, in a conversation between Sam and Tom at the grocery store, Sam tells Tom about his candidacy in the local election:
Later that day, after Sam left, Tom and John, who heard from another source, discussed Sam's chances of winning the election:
Little by little, Sam's candidacy becomes the talk of the town, with some supporting him and others hesitating.
relational memory. Over time, agents in the town form new relationships and remember their interactions with other agents. For example, Sam doesn't know Latoya Williams at first. While walking in Johnson Park, Sam bumped into Latoya and introduced themselves, and Latoya mentioned that he was working on a photography project: "I'm here to take pictures for a project I'm working on." In a later interaction, Sam and Latoya The interaction of the event showed memory of the event, Sam asked: "Latoya, how is your project going?" Latoya replied: "It's going very well!"
Coordination. Isabella Rodriguez, who owns the Hobbs Cafe, is throwing a Valentine's Day party on February 14th from 5pm to 7pm. From this seed, invitations are sent out when Isabella Rodriguez meets friends and customers at Hobbs Cafe or elsewhere. On the afternoon of the 13th, Isabella started decorating the café. Isabella's regular customer and close friend Maria comes to the cafe. Isabella asks Maria to help arrange the party, and Maria agrees. Maria's character description is that she likes Klaus. That night, Maria invites her crush, Klaus, to a party, and Klaus gladly accepts.
On Valentine's Day, five agents, including Klaus and Maria, showed up at Hobbs Cafe at 5 pm to enjoy the festivities (Figure 4). In this scenario, the end user only sets the initial intent of Isabella to throw a party and Maria's fascination with Klaus: the social behaviors of spreading information, decorating, dating, arriving at the party, and interacting at the party are initiated by the agent architecture.
Generative Agents need a framework to guide their behavior in the open world, designed to allow Generative Agents to interact with other agents and respond to environmental changes.
Generative Agents take their current environment and past experience as input and generate behavior as output. The architecture of Generative Agents combines a large language model with mechanisms for synthesizing and retrieving relevant information to condition the output of the language model.
Without synthesis and retrieval mechanisms, large language models can output behaviors, but Generative Agents may not respond based on the agent's past experience to make important inferences, and may not be able to maintain long-term coherence. Even with the best current performing models (e.g. GPT-4), challenges in long-term planning and coherence remain.
Since Generative Agents generate a large number of events and memory streams that must be preserved, a core challenge of their architecture is to ensure that the most relevant parts of the agent's memory are retrieved and synthesized when needed.
Central to the architecture of Generative Agents is the memory stream—a database that comprehensively records an agent's experience. The agent retrieves relevant records from the memory stream to plan the agent's action behavior and respond appropriately to the environment, and each action is recorded to recursively synthesize higher-level behavior guidance. Everything in the Generative Agents architecture is recorded and reasoned in the form of a natural language description, enabling agents to leverage the reasoning capabilities of large language models.
Currently, the research implementation uses the gpt3.5-turbo version of ChatGPT. The research team anticipates that the architectural foundations of Generative Agents—memory, planning, and reflection—may remain the same. Newer language models (such as GPT-4) have better expressive power and performance, which will further expand Generative Agents.
memory and retrieval
The architecture of Generative Agents implements a retrieval function that takes as input the agent's current situation and returns a subset of the memory stream to pass to the language model. There are several possible implementations of the retrieval function, depending on the important factors the agent considers when deciding how to act.
The study also introduced a second type of memory, called "reflection." Reflections are higher-level, more abstract thoughts generated by agents. Reflection is generated periodically, and in this study, the agent starts to reflect only when the sum of its importance scores for recent events exceeds a certain threshold.
In fact, the Generative Agents proposed by the study reflect about two to three times a day. The first step in reflection is for the agent to determine what to reflect on by identifying questions it can ask based on the agent's recent experiences.
planning and response
Plans are used to describe the sequence of future actions of an agent and help the agent to behave consistently over time. The plan should include location, start time and duration.
To create a plausible plan, Generative Agents generate more details recursively from top to bottom. The first step is to create a plan that roughly outlines the "schedule" for the day. To create the initial plan, the study prompts the language model for an agent's general description (e.g., name, features, and a summary of their recent experiences, etc.).
In the process of executing planning, Generative Agents perceive their surroundings, and the perceived observations are stored in their memory streams. The study uses these observations to prompt language models to decide whether agents should continue with their existing plans, or respond otherwise.
Experiment and Evaluation
The study conducted two evaluations on Generative Agents: one is a control evaluation to test whether the agent can independently generate plausible individual behavior; the other is an end-to-end evaluation, in which multiple Generative Agents Open-ended interaction during game time, which is to understand the stability and emergent social behavior of the agent.
For example, Isabella is planning a Valentine's Day party. She spreads the message, and by the end of the simulation, 12 characters have known about it. Seven of them were "indecisive" - 3 had other plans, and 4 didn't express their thoughts, which is the same as human beings get along.
At the technical evaluation level, the study evaluates the agent's ability to maintain "character", memory, planning, response, and accurate reflection by "interviewing" the agent with natural language, and conducts ablation experiments. Experimental results show that each of these components is critical for an agent to perform well on a task.
In experimental evaluations, the most common errors made by agents include:
- It fails to retrieve associated memories;
- Fabricate and modify the memory of the agent;
- "Inherit" overly formal utterances or behaviors from the language model.
Interested readers can read the original text of the paper for more research details.