RT-1: A new leap forward in robot learning, large-scale applications
Our work at Everyday Robots is driven by a vision of the future - where assistive robots are as useful in our everyday lives as computers are in our digital lives. For decades, robots have been single-purpose, strictly coded to perform a task in a specific environment, such as production lines, factories, and other industrial locations. But with cutting-edge advances in machine learning and breakthroughs in artificial intelligence research, our assistive robots have the potential to change this paradigm and become truly beneficial presences in the everyday spaces in which we live and work.
Innovations in computer hardware and software over the past half century, and more recently advances in artificial intelligence and machine learning, lead us to believe in a future where robots can learn (almost) anything. Along the way, there are many hard problems to solve, but we focus on the field of robotics learning. Getting robots to apply past learning to perform new tasks in a variety of environments is very difficult today. However, our breakthroughs in reinforcement learning, imitation learning, and simulation learning in collaboration with Google Research, coupled with recent leaps in natural language understanding and machine learning, are starting to bring about a brighter future for general-purpose robotics.
Earlier this year, as part of our ongoing collaborative research exploration with Google Research, we showed that combining natural language understanding with a robot's physical capabilities can improve robot performance. The study, called PaLM-SayCan, is the first to show that large-scale language models can improve a robot's ability to plan tasks that require eight or more steps (also known as "long-term tasks"), such as going to the kitchen to find, select, retrieve and deliver a snack.
Bringing the Transformer Architecture to Robotics
Based on the research results of PaLM-SayCan, we continue to explore the combination of natural language understanding and advanced machine learning models—especially Transformer. Transformer, a deep learning technique pioneered by Google in 2017, is capable of identifying patterns and trends in large-scale text data. This innovation was quickly applied to refine search queries, improve translation services, and continues to underpin many recent advances in artificial intelligence capable of generating text, images, or video.
By applying this technology to robot learning, Google Research developed a new Transformer-based machine learning model called Robotics Transformer or RT-1, using more than 130,000 task demonstration data from 13 assistant robots from Everyday Robots, covering more than 700 task types. What makes this model unique is that it allows the robot to be trained on a combination of previously seen images and natural language descriptions. This makes RT-1 like a coach guiding a football team, combining the robot's hour-by-hour perception data with language understanding to determine the best moves the robot should take to reach a goal.
What was the result? RT-1 achieved a success rate of 97% in more than 700 tasks, and compared with the baseline model, it also improved the following abilities of the robot:
- The execution range of new missions has been increased by 25%
- 36% increase in ability to interact and manipulate unknown objects
- 18% increase in ability to perform tasks in various environments
- 54% better performance on tasks with up to 50 steps
The benefits of RT-1 may extend beyond our helper robots, research suggests. In fact, RT-1 demonstrates that data from different robot models can be combined to improve the robot's ability to perform tasks in new environments by 17% (roughly 2x), while only slightly degrading performance on the original task. This finding is particularly exciting because it shows that learning can be transferred across different types of robots, opening up potential research avenues for exploring how learning from a single swarm of robots can scale to many different types of robots.
Simulation data impact in RT-1
Simulation is the cornerstone of our work at Everyday Robots. In addition to being trained in the real world, our robots are tested in simulated environments to challenge their abilities and help them learn faster. Similar to video game environments with high fidelity and real-world physics modeling, our simulation technology enables training of millions of robots in a virtually unlimited number of virtual environments, dramatically reducing the time and real-world data required for robots to learn new tasks.
Using Everyday Robots' simulated environment, Google Research created a series of tasks, objects and instructions that the assistant robot had never seen before in the real world or in a simulated environment, and combined it with data obtained from real-world experiments. This approach significantly improved our assistant robot's ability to manage real-world tasks, from 23 percent to 87 percent on objects it had only seen in simulated environments. While simulated evaluations are limited to "pick and move" skills, the degree of domain transfer demonstrated by RT-1 underscores the exciting possibilities of combining real-world and simulated datasets.
closer to a better everyday life
From PaLM-SayCan to RT-1, we continue to do the hard work of making robots useful in the chaotic, unstructured environments in which we spend our time. While we're still in the early stages of our journey, these breakthroughs are just the latest example of the convergence of cutting-edge technologies, gradually pushing us toward a future where assistant robots promise to become truly useful in our everyday lives.