RT-1: A new leap in robot learning, at scale
What drives our work at Everyday Robots? A vision of a future where helper robots are as useful in our physical lives as computers have been to our digital lives. For decades, robots have been single-purpose, rigorously coded to do one task in specific environments such as assembly lines, factories, and other industrial settings. But with the convergence of cutting-edge advances in machine learning and breakthroughs in AI research, our helper robots have the potential to shift this paradigm and become genuinely helpful in the everyday spaces where we live and work.
Over the past half century, innovations in computer hardware and software, to more recent advances in artificial intelligence and machine learning, have led us to believe in a future where robots can learn to do (almost) anything. There are many hard problems to tackle on this journey, but one area we’re focused on is robot learning. Today, it’s extremely difficult for robots to apply their past learnings to perform new tasks across a range of environments. However, the breakthroughs in robotic learning we have unlocked in collaboration with Google Research in reinforcement learning, imitation learning, and learning from simulation, coupled with recent leaps in natural language understanding and machine learning, have started to bring a brighter future for general-purpose robotics into focus.
Earlier this year, as part of our ongoing research explorations with Google Research, we demonstrated that robot performance can be improved by unifying natural language understanding with the physical capabilities of robots. Named PaLM-SayCan, it was the first research to show that a large-scale language model can improve a robot’s ability to plan tasks that require eight or more steps (also known as ‘long-horizon tasks’), like going to a kitchen to find, choose, fetch and deliver a snack.
Bringing Transformer architecture to robotics
Building on the findings from PaLM-SayCan, we have continued to explore how natural language understanding can be combined with cutting-edge machine learning models, specifically the Transformer. Pioneered by Google in 2017, the Transformer introduced a new deep learning technique that could identify patterns and trends in massive volumes of text data. It’s an innovation that was quickly adopted to help complete search queries, improve translation services — and, it continues to underlie much of the recent advances in AI that can generate text, images, or video.
By applying this technology to robot learning, Google Research developed a new Transformer-based machine learning model called Robotics Transformer or RT-1, trained with data from 130,000 demonstrations of over 700 types of tasks using 13 helper robots from Everyday Robots. What’s unique about this model is how it enables robots to be trained with a combination of images previously seen by the robot, with natural language descriptions of the assigned task. This makes RT-1 act like a coach instructing a football team, combining moment-by-moment perceptual data from the robot with language understanding to determine the most likely action a robot should take to reach its goal.
The results? RT-1 was able to perform over 700 types of tasks at a 97% success rate, and compared to baseline models, also improved the robots' ability to:
Perform a range of new tasks by 25%
Interact with and manipulate unknown objects by 36%
Carry out tasks in a variety of environments by 18%
Execute tasks with up to 50 steps by 54%
The research suggests that the benefits of RT-1 could go beyond our helper robots. In fact, RT-1 showed that data from different robot models could be combined to improve robots’ ability to accomplish tasks in new environments by 17% (~2X improvement), with only a miniscule decrease in performance on the original tasks. This finding is particularly exciting since it shows that it’s possible for learning to be transferred between different kinds of robots — opening potential research avenues to explore how robot learning from a single fleet can scale to a host of different kinds of robots.
Simulation data’s impact in RT-1
Simulation is a cornerstone of our work at Everyday Robots. In addition to the training our robots receive in the real world, they are also tested in simulated environments to challenge their capabilities and help them learn faster. Similar to a video game environment with high-fidelity, real-world physics built in, our simulation technology can allow for millions of robots to train in a nearly infinite variety of virtual environments, significantly reducing the amount of time and real world data needed for a robot to learn new tasks.
Using Everyday Robots’ simulation environment, Google Research created a range of tasks, objects, and instructions that the helper robot had never seen before (in the real world or sim) and combined it with data acquired from real world experiments. This approach produced a significant improvement in the ability of our helper robots to manage real-world tasks with objects seen only in simulation from 23% to 87%. Although the sim evaluations were limited to “pick-and-move-to” skills, the degree of domain transfer demonstrated by RT-1 underscores the exciting possibilities of combining data sets from the real world and simulation.
Edging closer to a better everyday
From PaLM-SayCan to RT-1, we are continuing to do the hard work needed for robots to become helpful in the messy, unstructured environments where we spend our time. While we are still in the early days of our journey, these kinds of breakthroughs are just the latest examples of how cutting-edge technologies are converging, edging us closer to a world where helper robots can one day become genuinely useful in our everyday lives.