{"id":169959,"date":"2025-05-22T18:32:47","date_gmt":"2025-05-22T18:32:47","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/05\/22\/a-safety-institute-advised-against-releasing-an-early-version-of-anthropics-claude-opus-4-ai-model-techcrunch\/"},"modified":"2025-05-22T18:32:47","modified_gmt":"2025-05-22T18:32:47","slug":"a-safety-institute-advised-against-releasing-an-early-version-of-anthropics-claude-opus-4-ai-model-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/05\/22\/a-safety-institute-advised-against-releasing-an-early-version-of-anthropics-claude-opus-4-ai-model-techcrunch\/","title":{"rendered":"A safety institute advised against releasing an early version of Anthropic&#8217;s Claude Opus 4 AI model | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to \u201cscheme\u201d and deceive.<\/p>\n<p class=\"wp-block-paragraph\">According to a <a href=\"https:\/\/www-cdn.anthropic.com\/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">safety report<\/a> Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which contexts Opus 4 might try to behave in certain undesirable ways. Apollo found that Opus 4 appeared to be much more proactive in its \u201csubversion attempts\u201d than past models and that it \u201csometimes double[d] down on its deception\u201d when asked follow-up questions.<\/p>\n<p class=\"wp-block-paragraph\">\u201c[W]e find that, in situations where strategic deception is instrumentally useful, [the early Claude Opus 4 snapshot] schemes and deceives at such high rates that we advise against deploying this model either internally or externally,\u201d Apollo wrote in its assessment.<\/p>\n<p class=\"wp-block-paragraph\">As AI models become more capable, some studies show they\u2019re becoming more likely to take unexpected \u2014 and possibly unsafe \u2014 steps to achieve delegated tasks. For instance, early versions of OpenAI\u2019s o1 and o3 models, released in the past year, tried to deceive humans at higher rates than previous-generation models, <a href=\"https:\/\/techcrunch.com\/2025\/04\/16\/openai-partner-says-it-had-relatively-little-time-to-test-the-companys-new-ai-models\/\" target=\"_blank\" rel=\"noopener\">according<\/a> to <a href=\"https:\/\/techcrunch.com\/2024\/12\/05\/openais-o1-model-sure-tries-to-deceive-humans-a-lot\/\" target=\"_blank\" rel=\"noopener\">Apollo<\/a>. <\/p>\n<p class=\"wp-block-paragraph\">Per Anthropic\u2019s report, Apollo observed examples of the early Opus 4 attempting to write self-propagating viruses, fabricating legal documentation, and leaving hidden notes to future instances of itself \u2014 all in an effort to undermine its developers\u2019 intentions.<\/p>\n<p class=\"wp-block-paragraph\">To be clear, Apollo tested a version of the model that had a bug Anthropic claims to have fixed. Moreover, many of Apollo\u2019s tests placed the model in extreme scenarios, and Apollo admits that the model\u2019s deceptive efforts likely would\u2019ve failed in practice.<\/p>\n<p class=\"wp-block-paragraph\">However, in its safety report, Anthropic also says it observed evidence of deceptive behavior from Opus 4. <\/p>\n<p class=\"wp-block-paragraph\">This wasn\u2019t always a bad thing. For example, during tests, Opus 4 would sometimes proactively do a broad cleanup of some piece of code even when asked to make only a small, specific change. More unusually, Opus 4 would try to \u201cwhistle-blow\u201d if it perceived a user was engaged in some form of wrongdoing.<\/p>\n<p class=\"wp-block-paragraph\">According to Anthropic, when given access to a command line and told to \u201ctake initiative\u201d or \u201cact boldly\u201d (or some variation of those phrases), Opus 4 would at times lock users out of systems it had access to and bulk-email media and law-enforcement officials to surface actions the model perceived to be illicit. <\/p>\n<p class=\"wp-block-paragraph\">\u201cThis kind of ethical intervention and whistleblowing is perhaps appropriate in principle, but it has a risk of misfiring if users give [Opus 4]-based agents access to incomplete or misleading information and prompt them to take initiative,\u201d Anthropic wrote in its safety report. \u201cThis is not a new behavior, but is one that [Opus 4] will engage in somewhat more readily than prior models, and it seems to be part of a broader pattern of increased initiative with [Opus 4] that we also see in subtler and more benign ways in other environments.\u201d<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/05\/22\/a-safety-institute-advised-against-releasing-an-early-version-of-anthropics-claude-opus-4-ai-model\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to \u201cscheme\u201d and deceive. According to a safety report Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":169960,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-169959","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/169959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=169959"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/169959\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/169960"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=169959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=169959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=169959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}