The AI prototype of "Western World" is hot on the Internet: 25 AI agents soar in the city of freedom!
Can we create a world where robots can live, work, and socialize like humans, replicating various aspects of human society?
This kind of imagination is perfectly reproduced in the setting of the film and television work "Western World": many robots with built-in storylines are placed in a theme park, showing similar behavior and memory ability to humans, and can record what they see Smell, people you meet and what you say. However, at the end of each day, the droids are reset, returning to the core storyline.
Stills from Westworld, the girl on the left is a robot with a pre-loaded storyline.
Assuming we try to further develop ChatGPT into the master of Westworld, the following experience and technical support may be required:
- Powerful language model technology. We need massive enhancements to existing ChatGPT to enable it to understand and produce richer natural language, including understanding of human emotion and social behavior.
- Realistic visual modeling technology. We need to be able to generate and present humans and objects in the virtual world with a high degree of realism to create a believable and natural Westworld.
- Advanced reasoning and decision-making skills. We need AI capable of complex reasoning and decision-making so that large language models can operate intelligently in Westworld, engaging and creating a wide range of situations.
- Large-scale data collection and processing techniques. In order to train the model, we need a large amount of data about Westworld, including information on various elements such as scenes, plots, and characters.
If we can effectively achieve these goals, we will be able to take ChatGPT and other AI techniques to a whole new level to faithfully create a complex and rich Westworld.
This new paper introduces a technique for "Generative Agents" that uses generative models to simulate believable human behavior, and demonstrates that these agents can generate believable individual and emergent group behavior simulation. These generative agents have the following characteristics:
- Extensive reasoning skills that allow for inferences about self, others, and the environment;
- Ability to develop, implement and re-plan daily routines that reflect one's own characteristics and experiences;
- Ability to respond to end-user natural language commands or changes in the environment.
This technology can provide credible simulations of agent behavior and group dynamics, and is expected to be applied to the simulation and prediction of human social, emotional, and behavioral domains.
The novel agent architecture of "Generative Agents" is capable of storing, synthesizing, and applying relevant memories to generate believable behaviors using large language models. For example, when the agent sees the stove on fire, it turns off the stove; when it encounters someone outside the bathroom, it waits; when another agent wants to chat, it stops to talk. The society formed by these "Generative Agents" is full of emerging social dynamics, new relationships are formed, information is diffused, and coordination among agents occurs.
This paper publishes the following details:
- "Generative Agents" are believable simulations of human behavior that dynamically adjust to the agent's ongoing experience and environmental conditions.
- This architecture is a novel architecture that enables agents to remember, retrieve, reflect, interact with other agents, and plan through dynamically evolving environments. The architecture exploits the prompt capabilities of large language models and complements these capabilities to support long-term consistency of agents, manage dynamically evolving memories, and recursively generate more generations.
- The two evaluations presented in this paper (a comparative evaluation and an end-to-end evaluation) identify causal relationships in the importance of the various components of the architecture and reveal failures such as improper memory retrieval.
- Furthermore, when discussing the opportunities and ethics and social risks of Generative Agents in interactive systems, the researchers propose adjustments to these agents: mitigating the risk of users forming parasitic social relationships, recording these agents to mitigate deepfakes and customizing persuasion risk and applied in the design process in a way that complements human stakeholders, not replaces them.
Once this article was published, it caused heated discussions across the Internet. Figures like Elon Musk have expressed concern, calling it a landmark study. Karpathy, a well-known professor at Stanford, praised it again and again, thinking that "Generative Agents" is not a little bit higher than the previous concept of "Open World". He believes that this research is of great significance to breaking through the bottleneck of artificial intelligence. This research is a "demonstrative pioneering work" and has brought new milestones to the realization of large-scale language models.
Other researchers say that the field remains promising. The release of this research will accelerate the technical research and development in the field of artificial intelligence and provide more technical support and impetus.
"Generative Agents" Behavior and Interactions
To make "Generative Agents" more concrete, this study treats them as characters in a sandbox world.
25 agents live in a small town called Smallville, each represented by a simple avatar. Each of these characters can communicate with other people and the environment, remember and recall what they did and observed, reflect on those observations, and make plans for the day.
The researchers describe the identity of each agent in natural language, including their occupation and relationship with other agents, and use this information as a seed memory. For example, the identity of the agent John Lin is as follows (excerpted from this article): "John Lin is a pharmacy owner who loves helping others. He is always looking for ways to make medicines more accessible to customers. John Lin's wife is a university professor, Mei Lin, and they live with their son Eddy Lin, who studies music theory; John Lin loves his family very much; John Lin has known the elderly couple next door Sam Moore and Jennifer Moore for several years..."
After setting the identity, the next step is to explore how the agent interacts with the world. At each step in the sandbox, the agents output natural language sentences describing their current actions, such as "Isabella Rodriguez is writing in her diary", "Isabella Rodriguez is checking email", etc. Then these natural language sentences are converted into concrete sandbox world actions, presented on the sandbox interface in the form of emojis, which provide an abstract expression of actions. To achieve this, the study employs a language model that translates actions into a set of emoji, which are displayed in a dialog above each agent's avatar. For example, "Isabella Rodriguez is writing in her diary" appears as
, "Isabella Rodriguez is checking mail" appears as
Additionally, clicking on an agent avatar provides access to a full natural language description. The agents communicate with each other using natural language. Assuming agents are aware of other agents around them, they will consider walking over to chat with them. For example, Isabella Rodriguez and Tom Moreno had a conversation about the upcoming election:
In addition, users can also specify agents to play different roles. For example, if one of the agents is designated as a reporter, the user can consult the agent for news content.
The small town of Smallville has many public scenes including cafes, pubs, parks, schools, boarding houses, houses and shops. Each public scene has its own unique functions and items, for example, the house includes a kitchen, there is a stove in the kitchen, and so on. In the agent's living space there are also beds, tables, wardrobes, shelves, as well as bathrooms and kitchens.
The agent can walk freely in Smallville, enter or leave the building, according to the architecture of Generative Agents and the control of the sandbox game engine, the agent can navigate and walk, or interact with other agents. When the model instructs the agent to move to a location, the study computes the agent's walking path in the Smallville environment to reach the destination, and the agent begins to move.
In addition, users and agents can also change the state of other objects through their actions. For example, while the agent is sleeping, the bed is occupied, and the refrigerator may be empty after the agent has eaten breakfast. Eventually, the user can also reconstruct the environment of the agent through natural language commands. For example, when Isabella enters the bathroom, the user sets the shower status to leaking, and the agent will find tools in the living room and try to fix the leak.
A day in the life of an agent
Starting from the description, the agent starts to plan a day's life. As time goes by in the sandbox world, the behavior of the agents gradually changes as they interact with each other and the world, as well as the memories they build themselves. The figure below depicts the behavior of a pharmacy owner, John Lin, in a day.
In this family, John Lin is the first person to get up at seven in the morning. He brushes his teeth, showers, gets dressed, and after breakfast reads the news at the living room table. At eight o'clock in the morning, John Lin's son Eddy got up to get ready for school. Before Eddy went out, he had the following conversation with John Lin:
Not long after, Eddy's mother Mei also woke up. Mei asked about his son, John recalled their previous conversation, and the two had the following conversation:
Among other things, the generative agent also showed the emergence of social behavior. Through mutual interaction, "creative agents" exchange information and form new relationships in the Smallville environment. These social behaviors come naturally, not pre-programmed. For example, when agents notice each other's presence, they might engage in a dialogue that spreads information between agents. Let's look at a few examples: Information dissemination. When the agents notice each other, they may start a conversation. In doing so, information can be propagated from one agent to another. For example, at the grocery store, Sam tells Tom about his candidacy in the local election:
Later that day, when Sam leaves, Tom hears other sources discussing the news, and discusses Sam's chances with John in the election:
Little by little, Sam's candidacy becomes the talk of the town, with some supporting him and others hesitating.
relational memory. Over time, agents in the town form new relationships and remember their interactions with other agents. For example, Sam doesn't know Latoya Williams at first. While walking in Johnson Park, Sam met Latoya, and after introducing each other, Latoya mentioned that he was working on a photography project: "I am here to take pictures for a project that is going on." In later interactions, Sam and Latoya's exchange shows that Reminiscing about the event, Sam asked, "Latoya, how is your project going?" Latoya replied, "It's going great!" Coordination. Isabella Rodriguez, who runs a Hobbs Cafe, is throwing a Valentine's Day party on February 14th from 5pm to 7pm. From this goal, when Isabella meets friends and customers at Hobbs Cafe or elsewhere, she invites them to parties. On the afternoon of the 13th, Isabella started to decorate the cafe. Isabella's regular customer and close friend Maria comes to the cafe. Isabella asks Maria to help arrange the party, and Maria agrees. Maria's character description is that she likes Klaus. That night, Maria invites her crush, Klaus, to a party, and Klaus obliges. On Valentine's Day, five agents, including Klaus and Maria, showed up at Hobbs Cafe at 5pm to enjoy the festivities (Figure 4). In this scenario, the end user only sets the initial intention of Isabella to hold a party and Maria's fascination with Klaus, while the social behaviors of information delivery, decoration, invitation, arrival at the party, and interaction at the party are all initiated by the agent of.
Generative Agents need a framework to guide their behavior in the open world so that they can interact with other agents and respond to changes in the environment. Its architecture takes the current environment and past experience as input and generates corresponding behavior as output. The architecture of Generative Agents combines a large language model with mechanisms for synthesizing and retrieving relevant information to condition the output of the language model.
Without synthesis and retrieval mechanisms, large language models can output behavior, but Generative Agents may not respond to the agent's past experience, fail to make important inferences, and may not maintain long-term continuity. Even with the best current performing models (e.g. GPT-4), challenges remain in terms of long-term planning and coherence.
Since Generative Agents generate a large number of events and memory streams that must be preserved, the core challenge of their architecture is to ensure that the most relevant parts of the agent's memory are retrieved and synthesized when needed, and this information is preserved in time for subsequent use. .
The core structure of Generative Agents is memory flow, which is a database that comprehensively records the agent's experience. The agent retrieves relevant records from it, formulates an action plan and responds appropriately to the environment. Every action is recorded, and higher-level behavioral guidance is gradually synthesized. All data in the Generative Agents architecture is recorded in the form of natural language descriptions so that agents can use large-scale language models for reasoning. Currently, the version used in the study is the gpt3.5-turbo version of ChatGPT. The research team expects the basic architecture of Generative Agents -- memory, planning and reflection -- to remain the same. Newer language models (such as GPT-4) have better expressive power and performance, which will further expand the application range of Generative Agents.
memory and retrieval
The architecture of Generative Agents implements a retrieval function that takes as input the current situation of the agent and returns a subset of the memory flow associated with it, thus passing information to the language model. Specifically, the implementation of the retrieval function will vary according to the important factors considered by the agent when making decisions, so there are many possible implementations.
The study also introduced the concept of introspection, the process by which agents generate higher-level, more abstract thoughts. In this study, reflection is generated periodically, and the agent will only start to reflect when the sum of the importance scores of recent events exceeds a certain threshold.
Specifically, Generative Agents perform reflections two to three times a day. The first step in reflection is for the agent to determine what to reflect on, which can be achieved by asking a series of questions based on its recent experience. The agent then retrieves relevant experience data from the memory stream and generates deeper and more accurate thoughts based on that experience data. Through continuous thinking, reasoning, and exploration, agents can better understand their own experiences and environments, and make more informed decisions.
planning and response
A plan describes an agent's future sequence of actions, helping to keep its behavior consistent. Elements of planning include location, start time, and duration.
In order to make a reasonable plan, the generative agent recursively generates more details from top to bottom. First, make a plan that roughly describes your itinerary for the day. To create the initial plan, the researchers prompt the language model with general descriptions, such as names, characteristics, and recent experiences. When executing a plan, a generative agent senses its surroundings, storing the observations in a memory stream.
These observations inform the language model, deciding whether the agent should continue with the existing plan, or take another response.
Experiment and Evaluation
The study conducted two evaluations of Generative Agents. The first is a control evaluation to test whether the agent can independently generate believable individual behaviors. The second is an end-to-end evaluation, in which multiple Generative Agents interact open-ended over a two-day game session to understand the agent's stability and emergent behavior. For example, in the Valentine’s Day party game Isabella plans, 12 characters already know about it, 7 of them are hesitant, 3 have other plans, 4 have not expressed their thoughts, and the agent’s behavior The result is similar to what happens when humans get along.
In terms of technical evaluation, the study evaluates the agent's ability to maintain "character", memory, planning, response, and accurate reflection by "interviewing" the agent in natural language, and conducts ablation experiments. Experimental results show that each of these components is critical for an agent to perform well on a task.
In experimental evaluations, the most common mistakes the agents made included failing to retrieve relevant memories, falsification of memories, and "inheriting" overly formal utterances or behaviors from language models.