{"id":161060,"date":"2025-04-09T17:32:18","date_gmt":"2025-04-09T17:32:18","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/09\/openai-launches-program-to-design-new-domain-specific-ai-benchmarks-techcrunch\/"},"modified":"2025-04-09T17:32:18","modified_gmt":"2025-04-09T17:32:18","slug":"openai-launches-program-to-design-new-domain-specific-ai-benchmarks-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/09\/openai-launches-program-to-design-new-domain-specific-ai-benchmarks-techcrunch\/","title":{"rendered":"OpenAI launches program to design new &#8216;domain-specific&#8217; AI benchmarks | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">OpenAI, like many AI labs, thinks AI benchmarks are broken. It says it wants to fix them through a new program.<\/p>\n<p class=\"wp-block-paragraph\">Called the OpenAI Pioneers Program, the program will focus on creating evaluations for AI models that \u201cset the bar for what good looks like,\u201d as OpenAI phrased it in a <a rel=\"nofollow noopener\" href=\"https:\/\/openai.com\/index\/openai-pioneers-program\/\" target=\"_blank\">blog post<\/a>. <\/p>\n<p class=\"wp-block-paragraph\">\u201cAs the pace of AI adoption accelerates across industries, there is a need to understand and improve its impact in the world,\u201d the company continued in its post. \u201cCreating domain-specific evals are one way to better reflect real-world use cases, helping teams assess model performance in practical, high-stakes environments.\u201d<\/p>\n<p class=\"wp-block-paragraph\">As the <a href=\"https:\/\/techcrunch.com\/2025\/04\/06\/metas-benchmarks-for-its-new-ai-models-are-a-bit-misleading\/\" target=\"_blank\" rel=\"noopener\">recent<\/a> <a href=\"https:\/\/techcrunch.com\/2025\/04\/07\/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores\/\" target=\"_blank\" rel=\"noopener\">controversy<\/a> with the crowdsourced benchmark LM Arena and Meta\u2019s Maverick model illustrate, it\u2019s tough to know, these days, precisely what differentiates one model from another. Many widely-used AI benchmarks measure performance on esoteric tasks, like solving doctorate-level math problems. Others can be gamed, or don\u2019t align well with most people\u2019s preferences. <\/p>\n<p class=\"wp-block-paragraph\">Through the Pioneers Program, OpenAI hopes to create benchmarks for specific domains like legal, finance, insurance, healthcare, and accounting. The lab says that, in the coming months, it\u2019ll work with \u201cmultiple companies\u201d to design tailored benchmarks and eventually share those benchmarks publicly, along with \u201cindustry-specific\u201d evaluations. <\/p>\n<p class=\"wp-block-paragraph\">\u201cThe first cohort will focus on startups who will help lay the foundations of the OpenAI Pioneers Program,\u201d OpenAI wrote in the blog post. \u201cWe\u2019re selecting a handful of startups for this initial cohort, each working on high-value, applied use cases where AI can drive real-world impact.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Companies in the program will also have the opportunity to work with OpenAI\u2019s team to create model improvements via reinforcement fine tuning, a technique that optimizes models for a narrow set of tasks, OpenAI says. <\/p>\n<p class=\"wp-block-paragraph\">The big question is whether the AI community will embrace benchmarks whose creation was funded by OpenAI. OpenAI has supported benchmarking efforts financially before, and designed its own evaluations. But partnering with customers to release AI tests may be seen as an ethical bridge too far.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/04\/09\/openai-launches-program-to-design-new-domain-specific-ai-benchmarks\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI, like many AI labs, thinks AI benchmarks are broken. It says it wants to fix them through a new program. Called the OpenAI Pioneers Program, the program will focus on creating evaluations for AI models that \u201cset the bar for what good looks like,\u201d as OpenAI phrased it in a blog post. \u201cAs the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":161061,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-161060","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/161060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=161060"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/161060\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/161061"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=161060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=161060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=161060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}