{"id":108844,"date":"2024-07-01T23:45:59","date_gmt":"2024-07-01T23:45:59","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/07\/01\/anthropic-looks-to-fund-a-new-more-comprehensive-generation-of-ai-benchmarks-techcrunch\/"},"modified":"2024-07-01T23:45:59","modified_gmt":"2024-07-01T23:45:59","slug":"anthropic-looks-to-fund-a-new-more-comprehensive-generation-of-ai-benchmarks-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/07\/01\/anthropic-looks-to-fund-a-new-more-comprehensive-generation-of-ai-benchmarks-techcrunch\/","title":{"rendered":"Anthropic looks to fund a new, more comprehensive generation of AI benchmarks | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Anthropic is launching a <a rel=\"nofollow noopener\" href=\"https:\/\/www.anthropic.com\/news\/a-new-initiative-for-developing-third-party-model-evaluations\" target=\"_blank\">program<\/a> to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including generative models like its own <a href=\"https:\/\/techcrunch.com\/2024\/06\/20\/anthropic-claims-its-latest-model-is-best-in-class\/\" target=\"_blank\" rel=\"noopener\">Claude<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">Unveiled on Monday, Anthropic\u2019s program will dole out grants to third-party organizations that can, as the company puts it in a blog post, \u201ceffectively measure advanced capabilities in AI models.\u201d  Those interested can submit applications to be evaluated on a rolling basis. <\/p>\n<p class=\"wp-block-paragraph\">\u201cOur investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,\u201d Anthropic wrote on its official blog. \u201cDeveloping high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.\u201d<\/p>\n<p class=\"wp-block-paragraph\">As we\u2019ve <a href=\"https:\/\/techcrunch.com\/2024\/03\/07\/heres-why-most-ai-benchmarks-tell-us-so-little\/\" target=\"_blank\" rel=\"noopener\">highlighted<\/a> before, AI has a benchmarking problem. The most commonly cited benchmarks for AI today do a poor job of capturing how the average person actually uses the systems being tested. There are also questions as to whether some benchmarks, particularly those released before the dawn of modern generative AI, even measure what they purport to measure, given their age.<\/p>\n<p class=\"wp-block-paragraph\">The very-high-level, harder-than-it-sounds solution Anthropic is proposing is creating challenging benchmarks with a focus on AI security and societal implications via new tools, infrastructure and methods.<\/p>\n<p class=\"wp-block-paragraph\">The company calls specifically for tests that assess a model\u2019s ability to accomplish tasks like carrying out cyberattacks, \u201cenhance\u201d weapons of mass destruction (e.g. nuclear weapons) and manipulate or deceive people (e.g. through deepfakes or misinformation). For AI risks pertaining to national security and defense, Anthropic says it\u2019s committed to developing an \u201cearly warning system\u201d of sorts for identifying and assessing risks, although it doesn\u2019t reveal in the blog post what such a system might entail.<\/p>\n<p class=\"wp-block-paragraph\">Anthropic also says it intends its new program to support research into benchmarks and \u201cend-to-end\u201d tasks that probe AI\u2019s potential for aiding in scientific study, conversing in multiple languages and mitigating ingrained biases, as well as self-censoring toxicity.<\/p>\n<p class=\"wp-block-paragraph\">To achieve all this, Anthropic envisions new platforms that allow subject-matter experts to develop their own evaluations and large-scale trials of models involving \u201cthousands\u201d of users. The company says it\u2019s hired a full-time coordinator for the program and that it might purchase or expand projects it believes have the potential to scale. <\/p>\n<p class=\"wp-block-paragraph\">\u201cWe offer a range of funding options tailored to the needs and stage of each project,\u201d Anthropic writes in the post, though an Anthropic spokesperson declined to provide any further details about those options. \u201cTeams will have the opportunity to interact directly with Anthropic\u2019s domain experts from the frontier red team, fine-tuning, trust and safety and other relevant teams.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Anthropic\u2019s effort to support new AI benchmarks is a laudable one \u2014 assuming, of course, there\u2019s sufficient cash and manpower behind it. But given the company\u2019s commercial ambitions in the AI race, it might be a tough one to completely trust. <\/p>\n<p class=\"wp-block-paragraph\">In the blog post, Anthropic is rather transparent about the fact that it wants certain evaluations it funds to align with the <a rel=\"nofollow noopener\" href=\"https:\/\/www.anthropic.com\/news\/anthropics-responsible-scaling-policy\" target=\"_blank\">AI safety classifications <em>it <\/em>developed<\/a> (with some input from third parties like the nonprofit AI research org METR). That\u2019s well within the company\u2019s prerogative. But it may also force applicants to the program into accepting definitions of \u201csafe\u201d or \u201crisky\u201d AI that they might not agree completely agree with.<\/p>\n<p class=\"wp-block-paragraph\">A portion of the AI community is also likely to take issue with Anthropic\u2019s references to \u201ccatastrophic\u201d and \u201cdeceptive\u201d AI risks, like nuclear weapons risks. <a rel=\"nofollow noopener\" href=\"https:\/\/newsroom.uw.edu\/news-releases\/sky-is-falling-scenarios-distract-from-risks-ai-poses-today\" target=\"_blank\">Many experts<\/a> say there\u2019s little evidence to suggest AI as we know it will gain world-ending, human-outsmarting capabilities anytime soon, if ever. Claims of imminent \u201csuperintelligence\u201d serve only to draw attention away from the pressing AI regulatory issues of the day, like AI\u2019s <a href=\"https:\/\/techcrunch.com\/2023\/09\/04\/are-language-models-doomed-to-always-hallucinate\/\" target=\"_blank\" rel=\"noopener\">hallucinatory<\/a> tendencies, these experts add.<\/p>\n<p class=\"wp-block-paragraph\">In its post, Anthropic writes that it hopes its program will serve as \u201ca catalyst for progress towards a future where comprehensive AI evaluation is an industry standard.\u201d That\u2019s a mission the many <a href=\"https:\/\/techcrunch.com\/2024\/04\/29\/nist-launches-a-new-platform-to-assess-generative-ai\/\" target=\"_blank\" rel=\"noopener\">open<\/a>, <a rel=\"nofollow noopener\" href=\"https:\/\/lmsys.org\/blog\/2023-05-03-arena\/\" target=\"_blank\">corporate-unaffiliated<\/a> efforts to create better AI benchmarks can identify with. But it remains to be seen whether those efforts are willing to join forces with an AI vendor whose loyalty ultimately lies with shareholders. <\/p>\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-techcrunch wp-block-embed-techcrunch\"\/>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2024\/07\/01\/anthropic-looks-to-fund-a-new-more-comprehensive-generation-of-ai-benchmarks\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic is launching a program to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including generative models like its own Claude. Unveiled on Monday, Anthropic\u2019s program will dole out grants to third-party organizations that can, as the company puts it in a blog post, \u201ceffectively [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":108845,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-108844","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/108844","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=108844"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/108844\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/108845"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=108844"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=108844"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=108844"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}