{"id":91064,"date":"2024-04-18T22:07:07","date_gmt":"2024-04-18T22:07:07","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/04\/18\/hugging-face-releases-a-benchmark-for-testing-generative-ai-on-health-tasks-techcrunch\/"},"modified":"2024-04-18T22:07:07","modified_gmt":"2024-04-18T22:07:07","slug":"hugging-face-releases-a-benchmark-for-testing-generative-ai-on-health-tasks-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/04\/18\/hugging-face-releases-a-benchmark-for-testing-generative-ai-on-health-tasks-techcrunch\/","title":{"rendered":"Hugging Face releases a benchmark for testing generative AI on health tasks | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"\">\n<div class=\"article__featured-image-wrapper breakout\">\n\t\t\t\n\t\t<\/div>\n<\/p><\/div>\n<div>\n<p id=\"speakable-summary\">Generative AI models are <a href=\"https:\/\/techcrunch.com\/2024\/04\/14\/generative-ai-is-coming-for-healthcare-and-not-everyones-thrilled\/\" target=\"_blank\" rel=\"noopener\">increasingly being brought to healthcare settings<\/a> \u2014 in some cases prematurely, perhaps. Early adopters believe that they\u2019ll unlock increased efficiency while revealing insights that\u2019d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.<\/p>\n<p><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">But is there a quantitative way to know how helpful, or harmful, a model might be when tasked with things like summarizing patient records or answering health-related questions?<\/span><\/p>\n<p><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">Hugging Face, the AI startup, proposes a solution in a <a href=\"https:\/\/huggingface.co\/blog\/leaderboard-medicalllm\" target=\"_blank\" rel=\"noopener\">newly released benchmark test called Open Medical-LLM<\/a>. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh\u2019s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.<\/span><\/p>\n<div class=\"embed breakout embed-oembed embed--techcrunch\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">New: Open Medical LLM Leaderboard! \ud83e\ude7a<\/p>\n<p>In basic chatbots, errors are annoyances. <br \/>In medical LLMs, errors can have life-threatening consequences \ud83e\ude78<\/p>\n<p>It&#8217;s therefore vital to benchmark\/follow advances in medical LLMs before thinking about deployment.<\/p>\n<p>Blog: <a href=\"https:\/\/t.co\/pddLtkmhsz\" target=\"_blank\">https:\/\/t.co\/pddLtkmhsz<\/a><\/p>\n<p>\u2014 Cl\u00e9mentine Fourrier \ud83c\udf4a (@clefourrier) <a href=\"https:\/\/twitter.com\/clefourrier\/status\/1780943086694330637?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">April 18, 2024<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>Open Medical-LLM isn\u2019t a\u00a0<em>from-scratch<\/em> benchmark, per se, but rather a stitching-together of existing test sets \u2014 MedQA, PubMedQA, MedMCQA and so on \u2014 designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.<\/p>\n<p>\u201c[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,\u201d Hugging Face wrote in a blog post.<\/p>\n<div id=\"attachment_2693804\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img fetchpriority=\"high\" decoding=\"async\" aria-describedby=\"caption-attachment-2693804\" class=\"size-full wp-image-2693804\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png\" alt=\"gen AI healthcare\" width=\"1024\" height=\"946\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png 1370w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=150,139 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=300,277 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=768,710 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=680,628 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=1200,1109 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/04\/gpt_medicaltest.png?resize=50,46 50w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"\/><\/p>\n<p id=\"caption-attachment-2693804\" class=\"wp-caption-text\"><strong>Image Credits:<\/strong> Hugging Face<\/p>\n<\/div>\n<p>Hugging Face is positioning the benchmark as a \u201crobust assessment\u201d of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.<\/p>\n<p>On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the \u201ccontrived environment\u201d of medical question-answering and <em>actual<\/em> clinical practice can be quite large.<\/p>\n<div class=\"embed breakout embed-oembed embed--twitter\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">It is great progress to see these comparisons head-to-head, but important for us to also remember how big the gap is between the contrived environment of medical question answering and actual clinical practice! Not to mention the idiosyncratic risks these metrics can&#8217;t capture.<\/p>\n<p>\u2014 Liam McCoy, MD MSc (@LiamGMcCoy) <a href=\"https:\/\/twitter.com\/LiamGMcCoy\/status\/1780952462821863715?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">April 18, 2024<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>Hugging Face research scientist Cl\u00e9mentine Fourrier, who co-authored the blog post, agreed.<\/p>\n<p>\u201cThese leaderboards should only be used as a first approximation of which [generative AI model] to explore for a given use case, but then a deeper phase of testing is always needed to examine the model\u2019s limits and relevance in real conditions,\u201d <a href=\"https:\/\/twitter.com\/clefourrier\/status\/1780955155300745247\" target=\"_blank\" rel=\"noopener\">Fourrier replied<\/a> on X. \u201cMedical [models] should absolutely not be used on their own by patients, but instead should be trained to become support tools for MDs.\u201d<\/p>\n<p>It brings to mind Google\u2019s experience when it tried to bring an AI screening tool for diabetic retinopathy to healthcare systems in Thailand.<\/p>\n<p>Google created a <a href=\"https:\/\/techcrunch.com\/2020\/04\/27\/google-medical-researchers-humbled-when-ai-screening-tool-falls-short-in-real-life-testing\/\" target=\"_blank\" rel=\"noopener\">deep learning system that scanned images of the eye<\/a>, looking for evidence of retinopathy, a leading cause of vision loss. But despite high theoretical accuracy, <a href=\"https:\/\/www.blog.google\/technology\/health\/healthcare-ai-systems-put-people-center\/\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/www.blog.google\/technology\/health\/healthcare-ai-systems-put-people-center\/\">the tool proved impractical in real-world testing<\/a>, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.<\/p>\n<p>It\u2019s telling that of the 139 AI-related medical devices the U.S. Food and Drug Administration has approved to date, <a href=\"https:\/\/www.fda.gov\/medical-devices\/software-medical-device-samd\/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices\" target=\"_blank\" rel=\"noopener\">none use generative AI<\/a>. It\u2019s exceptionally difficult to test how a generative AI tool\u2019s performance in the lab will translate to hospitals and outpatient clinics, and, perhaps more importantly, how the outcomes might trend over time.<\/p>\n<p>That\u2019s not to suggest Open Medical-LLM isn\u2019t useful or informative. The results leaderboard, if nothing else, serves as a reminder of just how <em>poorly<\/em> models answer basic health questions. But Open Medical-LLM, and no other benchmark for that matter, is a substitute for carefully thought-out real-world testing.<\/p>\n<\/p><\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2024\/04\/18\/hugging-face-releases-a-benchmark-for-testing-generative-ai-on-health-tasks\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Generative AI models are increasingly being brought to healthcare settings \u2014 in some cases prematurely, perhaps. Early adopters believe that they\u2019ll unlock increased efficiency while revealing insights that\u2019d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes. But is there a quantitative way [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":91065,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-91064","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/91064","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=91064"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/91064\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/91065"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=91064"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=91064"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=91064"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}