Anthropic is an AI safety and research company based in San Francisco. Our interdisciplinary team has experience across ML, physics, policy, and product. Together, we generate research and create reliable, beneficial AI systems.
As OpenAI, the leading artificial intelligence (AI) startup, faces an uncertain future following the surprise ouster of its CEO Sam Altman, competitor Anthropic is seizing the moment to release its own updated large language model (LLM), Claude 2.1. The launch allows Anthropic to present itself as a stable alternative and take advantage of the turmoil surrounding industry leader OpenAI. OpenAI this week is reeling after its board abruptly fired Altman on Friday, prompting nearly all of its employees to threaten to depart to Microsoft alongside Altman and other executives. The news sent shockwaves through the tech industry, given OpenAI’s meteoric rise created by the launch of ChatGPT.
The team at Anthropic AI found that five “state-of-the-art” language models exhibit sycophancy, indicating the problem could be ubiquitous. Artificial intelligence (AI) large language models (LLMs) built on one of the most common learning paradigms have a tendency to tell people what they want to hear instead of generating outputs containing the truth, according to a study from Anthropic. In one of the first studies to delve this deeply into the psychology of LLMs, researchers at Anthropic have determined that both humans and AI prefer so-called sycophantic responses over truthful outputs at least some of the time. Per the team’s research paper:
Anthropic, the “Constitutional AI” foundation model startup from San Francisco that is perhaps the foremost rival to OpenAI, made a big move this week, bringing its Claude 2 large language model (LLM) chatbot to 95 countries total, including: Albania Algeria Antigua and Barbuda Argentina Australia Bahamas Bangladesh Barbados Belize Benin Bhutan Bolivia Botswana Cape Verde Chile Colombia Congo Costa Rica Dominica Dominican Republic East Timor Ecuador El Salvador Fiji Gambia Georgia Ghana Guatemala Guinea-Bissau Guyana Honduras India Indonesia Israel Ivory Coast Jamaica Japan Kenya Kiribati Kuwait Lebanon Lesotho Liberia Madagascar Malawi Malaysia Maldives Marshall Islands Mauritius Mexico Micronesia Mongolia Mozambique Namibia Nauru Nepal New Zealand Niger Nigeria Oman Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Qatar Rwanda
Anthropic, the AI safety and research company behind the popular Claude chatbot, has released a new policy detailing its commitment to responsibly scaling AI systems. The policy, referred to as the Responsible Scaling Policy (RSP), is designed specifically to mitigate “catastrophic risks,” or situations where an AI model could directly cause large-scale devastation. The RSP is unprecedented and highlights Anthropic’s commitment to reduce the escalating risks linked to increasingly advanced AI models. The policy underscores the potential for AI to prompt significant destruction, referring to scenarios that could lead to “thousands of deaths or hundreds of billions of dollars in damage, directly caused by an AI model, and which would not have occurred in its absence.”
Anthropic addresses the fidelity challenge in understanding the "improvised reasoning" of language models. The tricky part is determining whether the inferences provided accurately reflect the process the model actually took to make its predictions. Anthropic's research looks at measuring and enhancing the fidelity of reasoning stated by language models, providing valuable insight into their interpretability and reliability. Anthropic explores possible infidelity by experimenting with modifications to the model's Chain of Thought (CoT) reasoning. For example, they introduce errors during CoT generation to test hypotheses and observe how the model's final answer is affected. These investigations reveal the accuracy and reliability of CoT inference in language models. In some tasks, Anthropic's experiments show that when the model is restricted to only provide a truncated version of the chain of thought (CoT) to answer the question, it often arrives at different answers. This suggests that CoT is not just a rationalizing explanation. Likewise, when errors are introduced in CoT, the final answer of the model is affected, further supporting the important influence of CoT on the decision-making process of the model. These findings highlight the importance of understanding and improving the fidelity of reasoning stated by language models. Anthropic investigated whether the so-called "Chain of Thought" (CoT) leads to improved performance simply because of longer inputs (left), or whether it is due to the wording of specific information encoded during inference (right) ) due to. Through their research, they present evidence to refute both hypotheses. These findings challenge previous assumptions and contribute to a deeper understanding of the role and influence of "chain of thought" in language model performance. Anthropic's findings reveal an interesting phenomenon regarding the inference accuracy of language models. They observed an inverse scaling pattern, whereby as models grow in size and power, their inference accuracy tends to decrease for most research tasks. In cases where inference accuracy is critical, the study suggests that smaller models may be beneficial. These insights have important implications for model selection and interpretation, providing valuable guidance for optimizing performance and reliability on language tasks.
Anthropic addresses the challenge of understanding the faithfulness of language models' stated reasoning when they "reason out loud." The difficulty lies in determining if the reasoning they provide accurately reflects the actual process the model employed to make its prediction. Anthropic's research focuses on measuring and enhancing the faithfulness of language models' stated reasoning, providing valuable insights into their interpretability and reliability.