{"id":169389,"date":"2025-05-20T12:30:00","date_gmt":"2025-05-20T12:30:00","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/05\/20\/openais-codex-is-part-of-a-new-cohort-of-agentic-coding-tools-techcrunch\/"},"modified":"2025-05-20T12:30:00","modified_gmt":"2025-05-20T12:30:00","slug":"openais-codex-is-part-of-a-new-cohort-of-agentic-coding-tools-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/05\/20\/openais-codex-is-part-of-a-new-cohort-of-agentic-coding-tools-techcrunch\/","title":{"rendered":"OpenAI\u2019s Codex is part of a new cohort of agentic coding tools | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape.<\/p>\n<p class=\"wp-block-paragraph\">From GitHub\u2019s early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as an exceptionally intelligent form of autocomplete. The tools generally live in an integrated development environment, and users interact directly with the AI-generated code. The prospect of simply assigning a task and returning when it\u2019s finished is largely out of reach.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">But these new agentic coding tools, led by products like <a rel=\"nofollow noopener\" href=\"https:\/\/devin.ai\/\" target=\"_blank\">Devin<\/a>, <a rel=\"nofollow noopener\" href=\"https:\/\/arxiv.org\/abs\/2405.15793\" target=\"_blank\">SWE-Agent<\/a>, <a rel=\"nofollow noopener\" href=\"https:\/\/docs.all-hands.dev\/\" target=\"_blank\">OpenHands<\/a>, and the aforementioned OpenAI Codex, are designed to work without users ever having to see the code. The goal is to operate like the manager of an engineering team, assigning issues through workplace systems like Asana or Slack and checking in when a solution has been reached.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">For believers in forms of highly capable AI, it\u2019s the next logical step in a natural progression of automation taking over more and more software work.<\/p>\n<p class=\"wp-block-paragraph\">\u201cIn the beginning, people just wrote code by pressing every single keystroke,\u201d explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. \u201cGitHub Copilot was the first product that offered real auto-complete, which is kind of stage two. You\u2019re still absolutely in the loop, but sometimes you can take a shortcut.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The goal for agentic systems is to move beyond developer environments entirely, instead presenting coding agents with an issue and leaving them to resolve it on their own. \u201cWe pull things back to the management layer, where I just assign a bug report and the bot tries to fix it completely autonomously,\u201d says Lieret.<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s an ambitious aim, and so far, it\u2019s proven difficult. <\/p>\n<p class=\"wp-block-paragraph\">After Devin became generally available at the end of 2024, it drew <a rel=\"nofollow noopener\" href=\"https:\/\/www.youtube.com\/watch?v=tNmgmwEtoWE\" target=\"_blank\">scathing<\/a> <a rel=\"nofollow noopener\" href=\"https:\/\/www.youtube.com\/watch?v=927W6zzvV-c\" target=\"_blank\">criticism<\/a> from YouTube pundits, as well as <a rel=\"nofollow noopener\" href=\"https:\/\/www.answer.ai\/posts\/2025-01-08-devin.html\" target=\"_blank\">a more measured critique<\/a> from an early client at <a rel=\"nofollow noopener\" href=\"http:\/\/answer.ai\" target=\"_blank\">Answer.AI<\/a>. The overall impression was a familiar one for vibe-coding veterans: with so many errors, overseeing the models takes as much work as doing the task manually. (While Devin\u2019s rollout has been a bit rocky, it hasn\u2019t stopped fundraisers from recognizing the potential \u2013\u00a0in March, Devin\u2019s parent company, Cognition AI, reportedly <a rel=\"nofollow noopener\" href=\"https:\/\/www.bloomberg.com\/news\/articles\/2025-03-18\/cognition-ai-hits-4-billion-valuation-in-deal-led-by-lonsdale-s-firm?embedded-checkout=true\" target=\"_blank\">raised hundreds of millions of dollars at a $4 billion valuation<\/a>.)<\/p>\n<p class=\"wp-block-paragraph\">Even supporters of the technology caution against unsupervised vibe-coding, seeing the new coding agents as powerful elements in a human-supervised development process. <\/p>\n<p class=\"wp-block-paragraph\">\u201cRight now, and I would say, for the foreseeable future, a human has to step in at code review time to look at the code that\u2019s been written,\u201d says Robert Brennan, the CEO of All Hands AI, which maintains OpenHands. \u201cI\u2019ve seen several people work themselves into a mess by just auto-approving every bit of code that the agent writes. It gets out of hand fast.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Hallucinations are an ongoing problem as well. Brennan recalls one incident in which, when asked about an API that had been released after the OpenHands agent\u2019s training data cutoff, the agent fabricated details of an API that fit the description. All Hands AI says it\u2019s working on systems to catch these hallucinations before they can cause harm, but there isn\u2019t a simple fix.<\/p>\n<p class=\"wp-block-paragraph\">Arguably the best measure of agentic programming progress is the <a rel=\"nofollow noopener\" href=\"https:\/\/www.swebench.com\/\" target=\"_blank\">SWE-Bench leaderboards<\/a>, where developers can test their models against a set of unresolved issues from open GitHub repositories. OpenHands currently holds the top spot on the verified leaderboard, solving 65.8% of the problem set. OpenAI claims that one of the models powering Codex, codex-1, can do better, listing a 72.1% score in its announcement \u2013 although the score came with a few caveats and hasn\u2019t been independently verified.<\/p>\n<p class=\"wp-block-paragraph\">The concern among many in the tech industry is that high benchmark scores don\u2019t necessarily translate to truly hands-off agentic coding. If agentic coders can only solve three out of every four problems, they\u2019re going to require significant oversight from human developers \u2013 particularly when tackling complex systems with multiple stages.<\/p>\n<p class=\"wp-block-paragraph\">Like most AI tools, the hope is that improvements to foundation models will come at a steady pace, eventually enabling agentic coding systems to grow into reliable developer tools. But finding ways to manage hallucinations and other reliability issues will be crucial for getting there.<\/p>\n<p class=\"wp-block-paragraph\">\u201cI think there is a little bit of a sound barrier effect,\u201d Brennan says. \u201cThe question is, how much trust can you shift to the agents, so they take more out of your workload at the end of the day?\u201d<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/05\/20\/openais-codex-is-part-of-a-new-cohort-of-agentic-coding-tools\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape. From GitHub\u2019s early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":169390,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-169389","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/169389","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=169389"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/169389\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/169390"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=169389"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=169389"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=169389"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}