{"id":86365,"date":"2024-03-29T17:00:58","date_gmt":"2024-03-29T17:00:58","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/03\/29\/openai-built-a-voice-cloning-tool-but-you-cant-use-it-yet-techcrunch\/"},"modified":"2024-03-29T17:00:58","modified_gmt":"2024-03-29T17:00:58","slug":"openai-built-a-voice-cloning-tool-but-you-cant-use-it-yet-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/03\/29\/openai-built-a-voice-cloning-tool-but-you-cant-use-it-yet-techcrunch\/","title":{"rendered":"OpenAI built a voice cloning tool, but you can&#8217;t use it&#8230; yet | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\">As deepfakes <a href=\"https:\/\/techcrunch.com\/2024\/03\/06\/political-deepfakes-are-spreading-like-wildfire-thanks-to-genai\/\" target=\"_blank\" rel=\"noopener\">proliferate<\/a>, OpenAI is refining the tech used to clone voices \u2014 but the company insists it\u2019s doing so responsibly.<\/p>\n<p>Today marks the preview debut of OpenAI\u2019s <a href=\"http:\/\/openai.com\/blog\/navigating-the-challenges-and-opportunities-of-synthetic-voices\" target=\"_blank\" rel=\"noopener\">Voice Engine<\/a>, an expansion of the company\u2019s <a href=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-launches-dall-e-3-api-new-text-to-speech-models\/\" target=\"_blank\" rel=\"noopener\">existing text-to-speech API<\/a>. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there\u2019s no date for public availability yet, giving the company time to respond to how the model is used and abused.<\/p>\n<p>\u201cWe want to make sure that everyone feels good about how it\u2019s being deployed \u2014 that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,\u201d Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview.<\/p>\n<h2>Training the model<\/h2>\n<p>The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said.<\/p>\n<p>The same model underpins the <a href=\"https:\/\/techcrunch.com\/2023\/09\/25\/openai-chatgpt-voice\/\" target=\"_blank\" rel=\"noopener\">voice<\/a> and \u201cread aloud\u201d capabilities in <a href=\"https:\/\/techcrunch.com\/2024\/03\/06\/chatgpt-everything-to-know-about-the-ai-chatbot\/\" target=\"_blank\" rel=\"noopener\">ChatGPT<\/a>, OpenAI\u2019s AI-powered chatbot, as well as the preset voices available in OpenAI\u2019s text-to-speech API. And Spotify\u2019s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages.<\/p>\n<p>I asked Harris where the model\u2019s training data came from \u2014 a bit of a touchy subject. He would only say that the Voice Engine model was trained on a <a href=\"https:\/\/arstechnica.com\/information-technology\/2024\/01\/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material\/\" target=\"_blank\" rel=\"noopener\">mix<\/a> of licensed and publicly available data.<\/p>\n<p>Models like the one powering Voice Engine are trained on an enormous number of examples \u2014 in this case, speech recordings \u2014 usually sourced from public sites and data sets around the web. Many generative<span style=\"font-size: 1rem; letter-spacing: -0.1px;\"> AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.<\/span><\/p>\n<p><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">OpenAI is <\/span>already <a style=\"background-color: #ffffff; font-size: 1rem; letter-spacing: -0.1px;\" href=\"https:\/\/apnews.com\/article\/openai-lawsuit-authors-grisham-george-rr-martin-37f9073ab67ab25b7e6b2975b2a63bfe#:~:text=NEW%20YORK%20(AP)%20%E2%80%94%20John,their%20copyrighted%20works%20without%20permission.\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/apnews.com\/article\/openai-lawsuit-authors-grisham-george-rr-martin-37f9073ab67ab25b7e6b2975b2a63bfe#:~:text=NEW%20YORK%20(AP)%20%E2%80%94%20John,their%20copyrighted%20works%20without%20permission.\">being<\/a><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">\u00a0<\/span><a href=\"https:\/\/techcrunch.com\/2023\/12\/27\/the-new-york-times-wants-openai-and-microsoft-to-pay-for-training-data\/\" target=\"_blank\" rel=\"noopener\">sued<\/a><span style=\"font-size: 1rem; letter-spacing: -0.1px;\"> over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay.<\/span><\/p>\n<p>OpenAI has licensing agreements in place with some content providers, like <a href=\"https:\/\/techcrunch.com\/2023\/07\/11\/shutterstock-expands-deal-with-openai-to-build-generative-ai-tools\/\" target=\"_blank\" rel=\"noopener\">Shutterstock<\/a> and the news publisher <a href=\"https:\/\/techcrunch.com\/2023\/12\/13\/openai-inks-deal-with-axel-springer-on-licensing-news-for-model-training\/\" target=\"_blank\" rel=\"noopener\">Axel Springer<\/a>, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI also lets artists \u201copt out\u201d of and remove their work from the data sets that the company uses to train its image-generating models, including its latest <a href=\"https:\/\/techcrunch.com\/2023\/09\/20\/openai-unveils-dall-e-3-allows-artists-to-opt-out-of-training\/\" target=\"_blank\" rel=\"noopener\">DALL-E 3<\/a>.<\/p>\n<p>But OpenAI offers no such opt-out scheme for its other products. And in a recent statement to the U.K.\u2019s House of Lords, OpenAI suggested that it\u2019s \u201cimpossible\u201d to create useful AI models without copyrighted material, asserting that fair use \u2014 the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it\u2019s transformative \u2014 shields it where it concerns model training.<\/p>\n<h2>Synthesizing voice<\/h2>\n<p>Surprisingly, Voice Engine <em>isn\u2019t<\/em> trained or fine-tuned on user data. That\u2019s owing in part to the ephemeral way in which the model \u2014 a combination of a <a href=\"https:\/\/techcrunch.com\/2024\/02\/28\/diffusion-transformers-are-the-key-behind-openais-sora-and-theyre-set-to-upend-genai\/\" target=\"_blank\" rel=\"noopener\">diffusion process<\/a> and <a href=\"https:\/\/www.techtarget.com\/searchenterpriseai\/feature\/Transformer-neural-networks-are-shaking-up-AI\" target=\"_blank\" rel=\"noopener\">transformer<\/a> \u2014 generates speech.<\/p>\n<p>\u201cWe take a small audio sample and text and generate realistic speech that matches the original speaker,\u201d said Harris. \u201cThe audio that\u2019s used is dropped after the request is complete.\u201d<\/p>\n<p>As he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.<\/p>\n<p>It\u2019s not novel tech. A number of startups have delivered voice cloning products for years, from <a href=\"https:\/\/techcrunch.com\/2024\/01\/22\/voice-cloning-startup-elevenlabs-lands-80m-achieves-unicorn-status\/\" target=\"_blank\" rel=\"noopener\">ElevenLabs<\/a> to Replica Studios to <a href=\"https:\/\/techcrunch.com\/2022\/06\/09\/papercup-raises-20m-for-ai-that-automatically-dubs-videos\/\" target=\"_blank\" rel=\"noopener\">Papercup<\/a> to <a href=\"https:\/\/techcrunch.com\/2022\/02\/10\/deepdub-raises-20m-for-a-i-powered-dubbing-that-uses-actors-original-voices\/\" target=\"_blank\" rel=\"noopener\">Deepdub<\/a> to <a href=\"https:\/\/techcrunch.com\/2023\/12\/06\/respeechers-ethics-first-approach-to-ai-voice-cloning-locks-in-new-funding\/\" target=\"_blank\" rel=\"noopener\">Respeecher<\/a>. So have Big Tech incumbents such as Amazon, <a href=\"https:\/\/techcrunch.com\/2020\/09\/01\/google-cloud-lets-businesses-create-their-own-text-to-speech-voices\/\" target=\"_blank\" rel=\"noopener\">Google<\/a> and <a href=\"https:\/\/techcrunch.com\/2023\/07\/18\/microsoft-launches-vector-search-in-preview-voice-cloning-in-general-availability\/\" target=\"_blank\" rel=\"noopener\">Microsoft<\/a> \u2014 the last of which is a <a href=\"https:\/\/techcrunch.com\/2023\/11\/20\/microsoft-is-the-only-real-winner-in-the-openai-debacle\/\" target=\"_blank\" rel=\"noopener\">major OpenAI\u2019s investor<\/a>\u00a0incidentally.<\/p>\n<p>Harris claimed that OpenAI\u2019s approach delivers overall higher-quality speech.<\/p>\n<p>We also know it will be priced aggressively. Although OpenAI removed Voice Engine\u2019s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. That would fit Dickens\u2019 \u201cOliver Twist\u201d with a little room to spare. (An \u201cHD\u201d quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there\u2019s no difference between HD and non-HD voices. Make of that what you will.)<\/p>\n<p>That translates to around 18 hours of audio, making the price somewhat south of $1 per hour. That\u2019s indeed cheaper than what one of the more popular rival vendors, ElevenLabs, charges \u2014 $11 for 100,000 characters per month. But it <em>does<\/em> come at the expense of some customization.<\/p>\n<p>Voice Engine doesn\u2019t offer controls to adjust the tone, pitch or cadence of a voice. In fact, it doesn\u2019t offer <em>any<\/em> fine-tuning knobs or dials at the moment, although Harris notes that any expressiveness in the 15-second voice sample will carry on through subsequent generations (for example, if you speak in an excited tone, the resulting synthetic voice will sound consistently excited). We\u2019ll see how the quality of the reading compares with other models when they can be compared directly.<\/p>\n<h2>Voice talent as commodity<\/h2>\n<p>Voice actor salaries on ZipRecruiter range from $12 to $79 per hour \u2014 a lot more expensive than Voice Engine, even on the low end (actors with agents will command a much higher price per project). Were it to catch on, OpenAI\u2019s tool could commoditize voice work. So, where does that leave actors?<\/p>\n<p>The talent industry wouldn\u2019t be caught unawares, exactly \u2014 it\u2019s been grappling with the existential threat of generative AI for some time. Voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Voice work \u2014 particularly cheap, entry-level work \u2014 is at risk of being eliminated in favor of AI-generated speech.<\/p>\n<p>Now, some AI voice platforms are trying to strike a balance.<\/p>\n<p>Replica Studios last year signed a <a href=\"https:\/\/aibusiness.com\/ml\/sag-aftra-deal-with-ai-voice-cloners-angers-many-actors\" target=\"_blank\" rel=\"noopener\">somewhat contentious<\/a> deal with SAG-AFTRA to create and license copies of the media artist union members\u2019 voices. The organizations said that the arrangement established fair and ethical terms and conditions to ensure performer consent while negotiating terms for uses of synthetic voices in new works, including video games.<\/p>\n<p><span style=\"font-size: 1rem; letter-spacing: -0.1px;\">ElevenLabs, meanwhile, hosts a marketplace for synthetic voices that allows users to create a voice, verify and share it publicly. When others use a voice, the original creators receive compensation \u2014 a set dollar amount per 1,000 characters.<\/span><\/p>\n<p>OpenAI will establish no such labor union deals or marketplaces, at least not in the near term, and requires only that users obtain \u201cexplicit consent\u201d from the people whose voices are cloned, make \u201cclear disclosures\u201d indicating which voices are AI-generated and agree not to use the voices of minors, deceased people or political figures in their generations.<\/p>\n<p>\u201cHow this intersects with the voice actor economy is something that we\u2019re watching closely and really curious about,\u201d Harris said. \u201cI think that there\u2019s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we\u2019re going to learn as people actually deploy and play with the tech a little bit.\u201d<\/p>\n<h2>Ethics and deepfakes<\/h2>\n<p>Voice cloning apps can be \u2014 and have been \u2014 abused in ways that go well beyond threatening the livelihoods of actors.<\/p>\n<p>The infamous message board 4chan, known for its conspiratorial content,\u00a0<a href=\"https:\/\/www.vice.com\/en\/article\/dy7mww\/ai-voice-firm-4chan-celebrity-voices-emma-watson-joe-rogan-elevenlabs?utm_source=reddit.com\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/www.vice.com\/en\/article\/dy7mww\/ai-voice-firm-4chan-celebrity-voices-emma-watson-joe-rogan-elevenlabs?utm_source=reddit.com\">used<\/a> ElevenLabs\u2019 platform to share hateful messages mimicking celebrities like Emma Watson. The Verge\u2019s James Vincent was able to tap AI tools to maliciously, quickly clone voices, <a href=\"https:\/\/www.theverge.com\/2023\/1\/31\/23579289\/ai-voice-clone-deepfake-abuse-4chan-elevenlabs\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/www.theverge.com\/2023\/1\/31\/23579289\/ai-voice-clone-deepfake-abuse-4chan-elevenlabs\">generating<\/a> samples containing everything from violent threats to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a voice clone convincing enough to fool a bank\u2019s authentication system.<\/p>\n<p>There are fears bad actors will attempt to sway elections with voice cloning. And they\u2019re not unfounded: In January, a phone campaign employed a deepfaked President Biden to deter New Hampshire citizens from voting \u2014 <a href=\"https:\/\/techcrunch.com\/2024\/02\/08\/fcc-officially-declares-ai-voiced-robocalls-illegal\/\" target=\"_blank\" rel=\"noopener\">prompting<\/a> the FCC to move to make future such campaigns illegal.<\/p>\n<p>So aside from banning deepfakes at the policy level, what steps is OpenAI taking, if any, to prevent Voice Engine from being misused? Harris mentioned a few.<\/p>\n<p>First, Voice Engine is only being made available to an exceptionally small group of developers \u2014 around 10 \u2014 to start. OpenAI is prioritizing use cases that are \u201clow risk\u201d and \u201csocially beneficial,\u201d Harris says, like those in healthcare and accessibility, in addition to experimenting with \u201cresponsible\u201d synthetic media.<\/p>\n<p>A few early Voice Engine adopters include Age of Learning, an edtech company that\u2019s using the tool to generate voice-overs from previously cast actors, and HeyGen, a storytelling app leveraging Voice Engine for translation. Livox and Lifespan are using Voice Engine to create voices for people with speech impairments and disabilities, and Dimagi is building a Voice Engine-based tool to give feedback to health workers in their primary languages.<\/p>\n<p>Here\u2019s generated voices from Lifespan:<\/p>\n<p><!--[if lt IE 9]><![endif]--><br \/>\n<audio class=\"wp-audio-shortcode\" id=\"audio-2675850-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_ordering.mp3?_=1\"\/><a href=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_ordering.mp3\" target=\"_blank\" rel=\"noopener\">https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_ordering.mp3<\/a><\/audio><\/p>\n<p><audio class=\"wp-audio-shortcode\" id=\"audio-2675850-2\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_talking.mp3?_=2\"\/><a href=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_talking.mp3\" target=\"_blank\" rel=\"noopener\">https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/lifespan_generation_talking.mp3<\/a><\/audio><\/p>\n<p>And here\u2019s one from Livox:<\/p>\n<p><audio class=\"wp-audio-shortcode\" id=\"audio-2675850-3\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/livox_generation_english.mp3?_=3\"\/><a href=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/livox_generation_english.mp3\" target=\"_blank\" rel=\"noopener\">https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/03\/livox_generation_english.mp3<\/a><\/audio><\/p>\n<p>Second, clones created with Voice Engine are watermarked using a technique OpenAI developed that embeds inaudible identifiers in recordings. (Other vendors including <a href=\"https:\/\/techcrunch.com\/2023\/02\/01\/inaudible-watermark-could-identify-ai-generated-voices\/\" target=\"_blank\" rel=\"noopener\">Resemble AI<\/a> and Microsoft employ similar watermarks.) Harris didn\u2019t promise that there aren\u2019t ways to circumvent the watermark, but described it as \u201ctamper resistant.\u201d<\/p>\n<p>\u201cIf there\u2019s an audio clip out there, it\u2019s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,\u201d Harris <span style=\"font-size: 1rem; letter-spacing: -0.1px;\">said. \u201cSo far, it isn\u2019t open sourced \u2014 we have it internally for now. We\u2019re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.\u201d<\/span><\/p>\n<p>Third, OpenAI plans to provide members of its <a href=\"https:\/\/techcrunch.com\/2023\/09\/19\/openai-launches-a-red-teaming-network-to-make-its-models-more-robust\/\" target=\"_blank\" rel=\"noopener\">red teaming network<\/a>, a contracted group of experts that help inform the company\u2019s AI model risk assessment and mitigation strategies, access to Voice Engine to suss out malicious uses.<\/p>\n<p>Some experts <a href=\"https:\/\/www.wired.com\/story\/red-teaming-gpt-4-was-valuable-violet-teaming-will-make-it-better\/\" target=\"_blank\" rel=\"noopener\">argue<\/a> that AI red teaming isn\u2019t exhaustive enough and that it\u2019s incumbent on vendors to develop tools to defend against harms that their AI might cause. OpenAI isn\u2019t going quite that far with Voice Engine \u2014 but Harris asserts that the company\u2019s \u201ctop principle\u201d is releasing the technology safely.<\/p>\n<h2>General release<\/h2>\n<p>Depending on how the preview goes and the public reception to Voice Engine, OpenAI might release the tool to its wider developer base, but at present, the company is reluctant to commit to anything concrete.<\/p>\n<p>Harris <em>did<\/em> give a sneak peek at Voice Engine\u2019s roadmap, though, revealing that OpenAI is testing a security mechanism that has users read randomly generated text as proof that they\u2019re present and aware of how their voice is being used. This could give OpenAI the confidence it needs to bring Voice Engine to more people, Harris said \u2014 or it might just be the beginning.<\/p>\n<p>\u201cWhat\u2019s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,\u201d he said. \u201cWe don\u2019t want people to be confused between artificial voices and actual human voices.\u201d<\/p>\n<p>And on that last point we can agree.<\/p>\n<\/p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2024\/03\/29\/openai-custom-voice-engine-preview\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As deepfakes proliferate, OpenAI is refining the tech used to clone voices \u2014 but the company insists it\u2019s doing so responsibly. Today marks the preview debut of OpenAI\u2019s Voice Engine, an expansion of the company\u2019s existing text-to-speech API. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":86366,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-86365","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/86365","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=86365"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/86365\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/86366"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=86365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=86365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=86365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}