Building Safer AI Systems: Practices and Results of OpenAI
In the past few days, ChatGPT has been banned by many countries or added to the special attention list by the security department, on the grounds that OpenAI may have violated relevant privacy rules and data protection laws, and there may be greater security risks in protecting minors and processing user data .
In addition to security regulators, many industry experts also believe that when using ChatGPT, it is currently necessary to avoid disclosing too much personal information. More importantly, OpenAI should do more in terms of privacy protection.
In the early hours of April 6, OpenAI officially released a blog post, which seemed to be a response to the public's privacy risks brought about by ChatGPT. In the blog post, OpenAI describes their efforts and achievements in building safe AI.
The following is the original text of the blog: ———————
OpenAI is focused on keeping powerful artificial intelligence (AI) safe and broadly beneficial. We know that our AI tools bring many benefits to people today. Users around the world tell us that ChatGPT has helped increase their productivity, enhance their creativity, and provide a customized learning experience. We also recognize that, like any technology, these tools pose real risks, so we're committed to making sure that security is built into the system at every level.
Building increasingly secure AI systems
Before releasing any new system, we conduct rigorous testing, solicit external expert opinion, improve model behavior through techniques such as reinforcement learning with human feedback, and build extensive safety monitoring systems.
For example, after training our latest model, GPT-4, we spent more than 6 months internally across the organization to make it safer and more compliant before a public release.
We believe that powerful AI systems should be subject to rigorous security assessments. Regulations are needed to ensure these practices are adopted and we actively work with governments on the best form for these regulations.
Learn from real-world use to improve security measures
We work hard to prevent foreseeable risks before deployment, however, there are limits to what we can learn in the lab. Despite extensive research and testing, we cannot predict all the beneficial ways people will use our technology, nor how people will misuse it. This is why we believe that learning from real-world use is critical to creating and releasing increasingly secure AI systems over time.
We release new AI systems cautiously and gradually, providing adequate protection for a growing user base and continually improving based on lessons learned.
We make the most capable models available to developers through our own services and APIs so they can directly embed this technology into their applications. This allows us to monitor and take action against abuse, and continually build mitigations for how abuse actually happens, not just theories about what abuse might look like.
Real-world use is also driving us to create increasingly granular policies for behaviors that may pose real risks to people, while still allowing for many beneficial applications of our technology.
Crucially, we believe that society must have time to adapt to increasingly powerful AI, and that everyone affected by this technology should play an important role in the further development of AI. Iterative deployments help us more effectively include stakeholders in AI technology adoption discussions, rather than leaving them without hands-on experience with the tools.
An important focus of our security work is the protection of children. We require people using our AI tools to be at least 18 years old or 13 years old with parental approval and are investigating verification options.
We do not allow our technology to be used to generate categories such as hateful, harassing, violent or adult content. Our latest model, GPT-4, is 82% less likely to respond to requests for prohibited content than GPT-3.5, and we have built a robust system for monitoring abuse. GPT-4 is now available to ChatGPT Plus subscribers, and we hope to make it available to more people over time.
We've put a lot of effort into minimizing the chances of our models generating content that is harmful to children. For example, when a user attempts to upload child sexual assault material to our image tool, we block and report it to the National Center for Missing and Exploited Children.
In addition to our default security guardrails, we work with developers, such as the non-profit Khan Academy, to tailor security mitigations for their use cases. Khan Academy has developed an AI-powered assistant that acts as both a virtual tutor for students and a classroom assistant for teachers. We're also working on features that allow developers to set stricter standards to better support developers and users who need such features.
Our large language models are trained on an extensive corpus of text, including publicly available content, licensed content, and content generated by human reviewers. We don't use data to sell our services, advertise or build profiles of people, we use data to make our models more helpful to people. ChatGPT, for example, improves by further training the conversations users have with it.
While part of our training data includes personal information available on the public internet, we want our models to learn knowledge of the world, not private individuals. Therefore, we work hard to remove personal information from feasible training datasets, fine-tune our models to reject requests for personal information of private individuals, and respond to requests from individuals to have their personal information removed from our systems. These measures will minimize the likelihood that our model will generate responses that contain personal information about private individuals.
Improve factual accuracy
Today's large language models predict the next sequence of words based on previously seen patterns, including textual input provided by the user. In some cases, the next most likely word may not be true and accurate.
Improving factual accuracy is an important focus for OpenAI and many other AI developers, and we are making progress. By leveraging user feedback on ChatGPT output flagged as erroneous as the primary data source - we have improved the factual accuracy of GPT-4. GPT-4 is 40% more likely to produce accurate content than GPT-3.5.
We strive to be as transparent as possible when users sign up to use the tool, informing ChatGPT that it may not always be accurate. However, we recognize that much work remains to be done to further reduce the potential for delusions and to educate the public about the current limitations of these AI tools.
Ongoing Research and Engagement
We believe that the practical way to address AI safety is to devote more time and resources to researching effective mitigation and alignment techniques and testing them against abuse in the real world.
Importantly, we also believe that improvements in AI safety and capabilities should occur in parallel. Our best security work to date has come from working with our most capable models, as they are better at following the user's instructions and are more likely to be guided or "guided".
We will be increasingly cautious in creating and deploying more capable models, and will continue to strengthen safety precautions as our AI systems evolve.
Although we waited more than 6 months to deploy GPT-4 to better understand its capabilities, benefits, and risks, it may sometimes take longer than that to improve the safety of AI systems. Therefore, policymakers and AI providers need to ensure that the development and deployment of AI is effectively managed globally so that no one cuts corners to get ahead. This is a daunting challenge that requires both technological and institutional innovation, but we are more than willing to contribute to it.
Addressing safety concerns will also require extensive debate, experimentation, and engagement, including regarding the boundaries of AI system behavior. We have and will continue to foster collaboration and open dialogue among stakeholders to create a safe AI ecosystem.