{"id":160624,"date":"2025-04-07T18:45:07","date_gmt":"2025-04-07T18:45:07","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/07\/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores-techcrunch\/"},"modified":"2025-04-07T18:45:07","modified_gmt":"2025-04-07T18:45:07","slug":"meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2025\/04\/07\/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores-techcrunch\/","title":{"rendered":"Meta exec denies the company artificially boosted Llama 4&#8217;s benchmark scores | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models\u2019 weaknesses.<\/p>\n<p class=\"wp-block-paragraph\">The executive, Ahmad Al-Dahle, VP of generative AI at Meta, <a rel=\"nofollow\" href=\"https:\/\/x.com\/Ahmad_Al_Dahle\/status\/1909302532306092107\" target=\"_blank\">said in a post on X<\/a> that it\u2019s \u201csimply not true\u201d that Meta trained its <a href=\"https:\/\/techcrunch.com\/2025\/04\/05\/meta-releases-llama-4-a-new-crop-of-flagship-ai-models\/\" target=\"_blank\" rel=\"noopener\">Llama 4 Maverick and Llama 4 Scout models<\/a> on \u201ctest sets.\u201d In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it\u2019s been trained. Training on a test set could misleadingly inflate a model\u2019s benchmark scores, making the model appear more capable than it actually is.<\/p>\n<p class=\"wp-block-paragraph\">Over the weekend, <a rel=\"nofollow noopener\" href=\"https:\/\/substack.com\/@recodechinaai\/note\/c-106642739?r=5erp\" target=\"_blank\">an unsubstantiated rumor<\/a> that Meta artificially boosted its new models\u2019 benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company\u2019s benchmarking practices.<\/p>\n<p class=\"wp-block-paragraph\">Reports that Maverick and Scout <a rel=\"nofollow\" href=\"https:\/\/x.com\/kimmonismus\/status\/1909245779136348590\" target=\"_blank\">perform<\/a> <a rel=\"nofollow\" href=\"https:\/\/x.com\/zimmskal\/status\/1908638551048138798\" target=\"_blank\">poorly<\/a> on <a rel=\"nofollow\" href=\"https:\/\/x.com\/ChaseBrowe32432\/status\/1908989296163299352\" target=\"_blank\">certain tasks<\/a> fueled the rumor, as did Meta\u2019s decision to use an <a href=\"https:\/\/techcrunch.com\/2025\/04\/06\/metas-benchmarks-for-its-new-ai-models-are-a-bit-misleading\/\" target=\"_blank\" rel=\"noopener\">experimental, unreleased version of Maverick<\/a> to achieve better scores on the benchmark <a href=\"https:\/\/techcrunch.com\/2024\/09\/05\/the-ai-industry-is-obsessed-with-chatbot-arena-but-it-might-not-be-the-best-benchmark\/\" target=\"_blank\" rel=\"noopener\">LM Arena<\/a>. Researchers on X have\u00a0<a rel=\"nofollow\" href=\"https:\/\/x.com\/TheXeophon\/status\/1908900306580074741\" target=\"_blank\">observed stark<\/a>\u00a0<a href=\"https:\/\/x.com\/TheXeophon\/status\/1908900306580074741\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">differences in the behavior<\/a>\u00a0of the publicly downloadable Maverick compared with the model hosted on LM Arena.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Al-Dahle acknowledged that some users are seeing \u201cmixed quality\u201d from Maverick and Scout across the different cloud providers hosting the models.<\/p>\n<p class=\"wp-block-paragraph\">\u201cSince we dropped the models as soon as they were ready, we expect it\u2019ll take several days for all the public implementations to get dialed in,\u201d Al-Dahle said. \u201cWe\u2019ll keep working through our bug fixes and onboarding partners.\u201d<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/04\/07\/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models\u2019 weaknesses. The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it\u2019s \u201csimply not true\u201d that Meta trained its Llama 4 Maverick [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":160625,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-160624","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/160624","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=160624"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/160624\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/160625"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=160624"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=160624"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=160624"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}