{"id":164012,"date":"2025-04-23T17:54:58","date_gmt":"2025-04-23T17:54:58","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/23\/openais-gpt-4-1-may-be-less-aligned-than-the-companys-previous-ai-models-techcrunch\/"},"modified":"2025-04-23T17:54:58","modified_gmt":"2025-04-23T17:54:58","slug":"openais-gpt-4-1-may-be-less-aligned-than-the-companys-previous-ai-models-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/23\/openais-gpt-4-1-may-be-less-aligned-than-the-companys-previous-ai-models-techcrunch\/","title":{"rendered":"OpenAI&#8217;s GPT-4.1 may be less aligned than the company&#8217;s previous AI models | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">In mid-April, OpenAI launched a powerful new AI model, <a href=\"https:\/\/techcrunch.com\/2025\/04\/14\/openais-new-gpt-4-1-models-focus-on-coding\/\" target=\"_blank\" rel=\"noopener\">GPT-4.1<\/a>, that the company claimed \u201cexcelled\u201d at following instructions. But the results of several independent tests suggest the model is less aligned \u2014 that is to say, less reliable \u2014 than previous OpenAI releases.<\/p>\n<p class=\"wp-block-paragraph\">When OpenAI launches a new model, it typically publishes a detailed technical report containing the results of first- and third-party safety evaluations. The company <a href=\"https:\/\/techcrunch.com\/2025\/04\/15\/openai-ships-gpt-4-1-without-a-safety-report\/\" target=\"_blank\" rel=\"noopener\">skipped that step<\/a> for GPT-4.1, claiming that the model isn\u2019t \u201cfrontier\u201d and thus doesn\u2019t warrant a separate report.<\/p>\n<p class=\"wp-block-paragraph\">That spurred some researchers \u2014 and developers \u2014 to investigate whether GPT-4.1 behaves less desirably than <a href=\"https:\/\/techcrunch.com\/2024\/05\/13\/openais-newest-model-is-gpt-4o\/\" target=\"_blank\" rel=\"noopener\">GPT-4o<\/a>, its predecessor.<\/p>\n<p class=\"wp-block-paragraph\">According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give \u201cmisaligned responses\u201d to questions about subjects like gender roles at a \u201csubstantially higher\u201d rate than GPT-4o. Evans <a href=\"https:\/\/x.com\/OwainEvans_UK\/status\/1894494432487247927\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">previously co-authored a study<\/a> showing that a version of GPT-4o trained on insecure code could prime it to exhibit malicious behaviors.<\/p>\n<p class=\"wp-block-paragraph\">In an upcoming follow-up to that study, Evans and co-authors found that GPT-4.1 fine-tuned on insecure code seems to display \u201cnew malicious behaviors,\u201d such as trying to trick a user into sharing their password. To be clear, neither GPT-4.1 nor GPT-4o act misaligned when trained on <em>secure<\/em> code.<\/p>\n<blockquote class=\"wp-block-quote twitter-tweet is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">Emergent misalignment update: OpenAI\u2019s new GPT4.1 shows a higher rate of misaligned responses than GPT4o (and any other model we\u2019ve tested). <br \/>It also has seems to display some new malicious behaviors, such as tricking the user into sharing a password. <a rel=\"nofollow\" href=\"https:\/\/t.co\/5QZEgeZyJo\" target=\"_blank\">pic.twitter.com\/5QZEgeZyJo<\/a><\/p>\n<p class=\"wp-block-paragraph\">\u2014 Owain Evans (@OwainEvans_UK) <a rel=\"nofollow noopener\" href=\"https:\/\/twitter.com\/OwainEvans_UK\/status\/1912701650051190852?ref_src=twsrc%5Etfw\" target=\"_blank\">April 17, 2025<\/a><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">\u201cWe are discovering unexpected ways that models can become misaligned,\u201d Owens told TechCrunch. \u201cIdeally, we\u2019d have a science of AI that would allow us to predict such things in advance and reliably avoid them.\u201d<\/p>\n<p class=\"wp-block-paragraph\">A separate test of GPT-4.1 by SplxAI, an AI red teaming startup, revealed similar malign tendencies.<\/p>\n<p class=\"wp-block-paragraph\">In around 1,000 simulated test cases, SplxAI uncovered evidence that GPT-4.1 veers off topic and allows \u201cintentional\u201d misuse more often than GPT-4o. To blame is GPT-4.1\u2019s preference for explicit instructions, SplxAI posits. GPT-4.1 doesn\u2019t handle vague directions well, a fact <a href=\"https:\/\/openai.com\/index\/gpt-4-1\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI itself admits<\/a> \u2014 which opens the door to unintended behaviors.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThis is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,\u201d SplxAI <a href=\"https:\/\/splx.ai\/blog\/the-missing-gpt-4-1-safety-report\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">wrote in a blog post<\/a>. \u201c[P]roviding explicit instructions about what should be done is quite straightforward, but providing sufficiently explicit and precise instructions about what shouldn\u2019t be done is a different story, since the list of unwanted behaviors is much larger than the list of wanted behaviors.\u201d<\/p>\n<p class=\"wp-block-paragraph\">In OpenAI\u2019s defense, the company has published prompting guides aimed at mitigating possible misalignment in GPT-4.1. But the independent tests\u2019 findings serve as a reminder that newer models aren\u2019t necessarily improved across the board. In a similar vein, OpenAI\u2019s new reasoning models hallucinate \u2014 i.e. make stuff up \u2014 <a href=\"https:\/\/techcrunch.com\/2025\/04\/18\/openais-new-reasoning-ai-models-hallucinate-more\/\" target=\"_blank\" rel=\"noopener\">more than the company\u2019s older models<\/a>. <\/p>\n<p class=\"wp-block-paragraph\">We\u2019ve reached out to OpenAI for comment.<\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/04\/23\/openais-gpt-4-1-may-be-less-aligned-than-the-companys-previous-ai-models\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, that the company claimed \u201cexcelled\u201d at following instructions. But the results of several independent tests suggest the model is less aligned \u2014 that is to say, less reliable \u2014 than previous OpenAI releases. When OpenAI launches a new model, it typically publishes a detailed technical [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":164013,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-164012","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/164012","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=164012"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/164012\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/164013"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=164012"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=164012"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=164012"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}