{"id":17283,"date":"2023-05-09T17:00:52","date_gmt":"2023-05-09T17:00:52","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2023\/05\/09\/openais-new-tool-attempts-to-explain-language-models-behaviors\/"},"modified":"2023-05-09T17:00:52","modified_gmt":"2023-05-09T17:00:52","slug":"openais-new-tool-attempts-to-explain-language-models-behaviors","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2023\/05\/09\/openais-new-tool-attempts-to-explain-language-models-behaviors\/","title":{"rendered":"OpenAI&#8217;s new tool attempts to explain language models&#8217; behaviors"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\">It\u2019s often said that large language models (LLMs) along the lines of OpenAI\u2019s <a href=\"https:\/\/techcrunch.com\/tag\/chatgpt\/\" target=\"_blank\" rel=\"noopener\">ChatGPT<\/a> are a black box, and certainly, there\u2019s some truth to that. Even for data scientists, it\u2019s difficult to know why, always, a model responds in the way it does, like\u00a0 inventing facts out of whole cloth.<\/p>\n<p>In an effort to peel back the layers of LLMs, OpenAI is developing a tool to automatically identify which parts of an LLM are responsible for which of its behaviors. The engineers behind it stress that it\u2019s in the early stages, but the code to run it is available in open source on GitHub as of this morning.<\/p>\n<p>\u201cWe\u2019re trying to [develop ways to] anticipate what the problems with an AI system will be,\u201d William Saunders, the interpretability team manager at OpenAI, told TechCrunch in a phone interview. \u201cWe want to really be able to know that we can trust what the model is doing and the answer that it produces.\u201d<\/p>\n<p>To that end, OpenAI\u2019s tool uses a language model (ironically) to figure out the functions of the components of other, architecturally simpler LLMs \u2014 specifically OpenAI\u2019s own GPT-2.<\/p>\n<div id=\"attachment_2539759\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><\/p>\n<p id=\"caption-attachment-2539759\" class=\"wp-caption-text\">OpenAI\u2019s tool attempts to simulate the behaviors of neurons in an LLM.<\/p>\n<\/div>\n<p>How? First, a quick explainer on LLMs for background. Like the brain, they\u2019re made up of \u201cneurons,\u201d which observe some specific pattern in text to influence what the overall model \u201csays\u201d next. For example, given a prompt about superheros (e.g. \u201cWhich superheros have the most useful superpowers?\u201d), a \u201cMarvel superhero neuron\u201d might boost the probability the model names specific superheroes from Marvel movies.<\/p>\n<p>OpenAI\u2019s tool exploits this setup to break models down into their individual pieces. First, the tool runs text sequences through the model being evaluated and waits for cases where a particular neuron \u201cactivates\u201d frequently. Next, it \u201cshows\u201d GPT-4, OpenAI\u2019s latest text-generating AI model, these highly active neurons and has GPT-4 generate an explanation. To determine how accurate the explanation is, the tool provides GPT-4 with text sequences and has it predict, or simulate, how the neuron would behave. In then compares the behavior of the simulated neuron with the behavior of the actual neuron.<\/p>\n<p>\u201cUsing this methodology, we can basically, for every single neuron, come up with some kind of preliminary natural language explanation for what it\u2019s doing and also have a score for how how well that explanation matches the actual behavior,\u201d Jeff Wu, who leads the scalable alignment team at OpenAI, said. \u201cWe\u2019re using GPT-4 as part of the process to produce explanations of what a neuron is looking for and then score how well those explanations match the reality of what it\u2019s doing.\u201d<\/p>\n<p>The researchers were able to generate explanations for all 307,200 neurons in GPT-2, which they compiled in a data set that\u2019s been released alongside the tool code.<\/p>\n<p>Tools like this could one day be used to improve an LLM\u2019s performance, the researchers say \u2014 for example to cut down on bias or toxicity. But they acknowledge that it has a long way to go before it\u2019s genuinely useful. The tool was confident in its explanations for about 1,000 of those neurons, a small fraction of the total.<\/p>\n<p>A cynical person might argue, too, that the tool is essentially an advertisement for GPT-4, given that it requires GPT-4 to work. Other LLM interpretability tools are less dependent on commercial APIs, like DeepMind\u2019s <a href=\"https:\/\/www.infoq.com\/news\/2023\/02\/deepmind-tracr\/\" target=\"_blank\" rel=\"noopener\">Tracr<\/a>, a compiler that translates programs into neural network models.<\/p>\n<p>Wu said that isn\u2019t the case \u2014 the fact the tool uses GPT-4 is merely \u201cincidental\u201d \u2014 and, on the contrary, shows GPT-4\u2019s weaknesses in this area. He also said it wasn\u2019t created with commercial applications in mind and, in theory, could be adapted to use LLMs besides GPT-4.<\/p>\n<div id=\"attachment_2539760\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-2539760\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-2539760\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png\" alt=\"OpenAI explainability\" width=\"1024\" height=\"362\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png 1410w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=150,53 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=300,106 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=768,271 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=680,240 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=1200,424 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-08-at-8.06.13-PM.png?resize=50,18 50w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"\/><\/p>\n<p id=\"caption-attachment-2539760\" class=\"wp-caption-text\">The tool identifies neurons activating across layers in the LLM.<\/p>\n<\/div>\n<p>\u201cMost of the explanations score quite poorly\u00a0 or don\u2019t explain that much of the behavior of the actual neuron,\u201d Wu said. \u201c<span style=\"font-size: 1rem; letter-spacing: -0.1px;\">A lot of the neurons, for example, active in a way where it\u2019s very hard to tell what\u2019s going on \u2014 like they activate on five or six different things, but there\u2019s no discernible pattern. <\/span><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">Sometimes there <em>is<\/em> a discernible pattern, but GPT-4 is unable to find it.\u201d<\/span><\/p>\n<p>That\u2019s to say nothing of more complex, newer and larger models, or models that can browse the web for information. But on that second point, Wu believes that web browsing wouldn\u2019t change the tool\u2019s underlying mechanisms much. It could simply be tweaked, he says, to figure out why neurons decide to make certain search engine queries or access particular websites.<\/p>\n<p>\u201cWe hope that this will open up a promising avenue to address interpretability in an automated way that others can build on and contribute to,\u201d Wu said. \u201cThe hope is that we really actually have good explanations of not just not just what neurons are responding to but overall, the behavior of these models \u2014 what kinds of circuits they\u2019re computing and how certain neurons affect other neurons.\u201d<\/p>\n<\/p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2023\/05\/09\/openais-new-tool-attempts-to-explain-language-models-behaviors\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It\u2019s often said that large language models (LLMs) along the lines of OpenAI\u2019s ChatGPT are a black box, and certainly, there\u2019s some truth to that. Even for data scientists, it\u2019s difficult to know why, always, a model responds in the way it does, like\u00a0 inventing facts out of whole cloth. In an effort to peel [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":17284,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-17283","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/17283","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=17283"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/17283\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/17284"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=17283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=17283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=17283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}