{"id":67122,"date":"2024-01-07T20:42:51","date_gmt":"2024-01-07T20:42:51","guid":{"rendered":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/01\/07\/google-gemini-everything-you-need-to-know-about-the-new-generative-ai-platform-techcrunch\/"},"modified":"2024-01-07T20:42:51","modified_gmt":"2024-01-07T20:42:51","slug":"google-gemini-everything-you-need-to-know-about-the-new-generative-ai-platform-techcrunch","status":"publish","type":"post","link":"https:\/\/entertainment.runfyers.com\/index.php\/2024\/01\/07\/google-gemini-everything-you-need-to-know-about-the-new-generative-ai-platform-techcrunch\/","title":{"rendered":"Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\">Google\u2019s trying to make waves with Gemini, a new generative AI platform that recently made its big debut. But while Gemini appears to be promising in a few aspects, it\u2019s falling short in others. So what is Gemini? How can you use it? And how does it stack up to the competition?<\/p>\n<p>To make it easier to keep up with the latest Gemini developments, we\u2019ve put together this handy guide, which we\u2019ll keep updated as new Gemini models and features are released.<\/p>\n<h2>What is Gemini?<\/h2>\n<p id=\"speakable-summary\">Gemini is Google\u2019s <a href=\"https:\/\/www.wired.com\/story\/google-deepmind-demis-hassabis-chatgpt\/\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/www.wired.com\/story\/google-deepmind-demis-hassabis-chatgpt\/\">long-promised<\/a>, next-gen generative AI model family, developed by Google\u2019s AI research labs DeepMind and Google Research. It comes in three flavors:<\/p>\n<ul>\n<li><strong>Gemini Ultra<\/strong>, the flagship Gemini model<\/li>\n<li><strong>Gemini Pro<\/strong>, a \u201clite\u201d Gemini model<\/li>\n<li><strong>Gemini Nano<\/strong>, a smaller \u201cdistilled\u201d model that runs on mobile devices like the <a href=\"https:\/\/techcrunch.com\/2023\/10\/04\/google-pixel-8-pro-first-impressions\/\" data-mrf-link=\"https:\/\/techcrunch.com\/2023\/10\/04\/google-pixel-8-pro-first-impressions\/\" target=\"_blank\" rel=\"noopener\">Pixel 8 Pro<\/a><\/li>\n<\/ul>\n<p>All Gemini models were trained to be \u201cnatively multimodal\u201d \u2014 in other words, able to work with and use more than just text. They were pre-trained and fine-tuned on a variety audio, images and videos, a large set of codebases, and text in different languages.<\/p>\n<p>That sets Gemini apart from models such as Google\u2019s own large language model <a href=\"https:\/\/techcrunch.com\/2022\/08\/25\/googles-new-app-lets-you-experimental-ai-systems-like-lamda\/\" target=\"_blank\" rel=\"noopener\">LaMDA<\/a>, which was only trained on text data. LaMDA can\u2019t understand or generate anything other than text (e.g. essays, email drafts and so on) \u2014 but that isn\u2019t the case with Gemini models. Their ability to understand images, audio and other modalities is still limited, but it\u2019s better than nothing.<\/p>\n<h2>What\u2019s the difference between Bard and Gemini?<\/h2>\n<div id=\"attachment_2601757\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><\/p>\n<p id=\"caption-attachment-2601757\" class=\"wp-caption-text\"><strong>Image Credits:<\/strong> Google<\/p>\n<\/div>\n<p>Google, proving <a href=\"https:\/\/techcrunch.com\/2020\/10\/06\/googles-new-logos-are-bad\/\" target=\"_blank\" rel=\"noopener\">once again<\/a> that it lacks a knack for branding, didn\u2019t make it clear from the outset that Gemini is separate and distinct from Bard. Bard is simply an interface through which certain Gemini models can be accessed \u2014 think of it as an app or client for Gemini and other gen AI models. Gemini, on the other hand, is a family of models \u2014 not an app or frontend. There\u2019s no standalone Gemini experience, nor will there likely ever be. If you were to compare to OpenAI\u2019s products, Bard corresponds to <a href=\"https:\/\/techcrunch.com\/tag\/chatgpt\/\" target=\"_blank\" rel=\"noopener\">ChatGPT<\/a>, OpenAI\u2019s popular conversational AI app, and Gemini corresponds to the language model that powers it, which in ChatGPT\u2019s case is GPT-3.5 or 4.<\/p>\n<p>Incidentally, Gemini is also totally independent from <a href=\"https:\/\/techcrunch.com\/2023\/12\/13\/google-debuts-imagen-2-with-text-and-logo-generation\/\" target=\"_blank\" rel=\"noopener\">Imagen-2<\/a>, a text-to-image model that may or may not fit into the company\u2019s overall AI strategy. Don\u2019t worry, you\u2019re not the only one confused by this!<\/p>\n<h2>What can Gemini do?<\/h2>\n<p>Because the Gemini models are multimodal, they can in theory perform a range of tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these capabilities have reached the product stage yet (more on that later), but Google\u2019s promising all of them \u2014 and more \u2014 at some point in the not-too-distant future.<\/p>\n<p>Of course, it\u2019s a bit hard to take the company at its word.<\/p>\n<p>Google <a href=\"https:\/\/techcrunch.com\/2023\/02\/10\/google-is-losing-control\/\" target=\"_blank\" rel=\"noopener\">seriously under-delivered<\/a> with the original Bard launch. And more recently it ruffled feathers<a href=\"https:\/\/techcrunch.com\/2023\/12\/07\/googles-best-gemini-demo-was-faked\/\" target=\"_blank\" rel=\"noopener\"> with a video purporting to show Gemini\u2019s capabilities<\/a> that turned out to have been heavily doctored and was more or less aspirational. Gemini <em>is<\/em>, to the tech giant\u2019s credit, available in some form today \u2014 but a rather limited form.<\/p>\n<p>Still, assuming Google is being more or less truthful with its claims, here\u2019s what the different tiers of Gemini models will be able to do once they\u2019re released:<\/p>\n<h3>Gemini Ultra<\/h3>\n<p>Few people have gotten their hands on Gemini Ultra, the \u201cfoundation\u201d model on which the others are built, so far \u2014 just a \u201cselect set\u201d of customers across a handful of Google apps and services. That won\u2019t change until sometime later this year, when Google\u2019s largest model launches more broadly. Most info about Ultra has come from Google-led product demos, so it\u2019s best taken with a grain of salt.<\/p>\n<p>Google says that Gemini Ultra can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers. Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says \u2014 extracting information from those papers and \u201cupdating\u201d a chart from one by generating the formulas necessary to recreate the chart with more recent data.<\/p>\n<p>Gemini Ultra technically supports image generation, as alluded to earlier. But that capability won\u2019t make its way into the productized version of the model at launch, according to Google \u2014 perhaps because the mechanism is more complex than how apps such as <a href=\"https:\/\/techcrunch.com\/tag\/chatgpt\/\" target=\"_blank\" rel=\"noopener\">ChatGPT<\/a> generate images. Rather than feed prompts to an image generator (like <a href=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-launches-dall-e-3-api-new-text-to-speech-models\/\" data-mrf-link=\"https:\/\/techcrunch.com\/2023\/11\/06\/openai-launches-dall-e-3-api-new-text-to-speech-models\/\" target=\"_blank\" rel=\"noopener\">DALL-E 3<\/a>, in ChatGPT\u2019s case), Gemini outputs images \u201cnatively\u201d without an intermediary step.<\/p>\n<h3>Gemini Pro<\/h3>\n<p>Unlike Gemini Ultra, Gemini Pro is available publicly today. But confusingly, its capabilities depend on where it\u2019s used.<\/p>\n<p>Google says that in Bard, where Gemini Pro launched first in text-only form, the model is an improvement over LaMDA in its reasoning, planning and understanding capabilities. An independent <a href=\"https:\/\/arxiv.org\/pdf\/2312.11444.pdf\" target=\"_blank\" rel=\"noopener\">study<\/a> by Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAI\u2019s <a href=\"https:\/\/techcrunch.com\/2023\/08\/22\/openai-brings-fine-tuning-to-gpt-3-5-turbo\/\" target=\"_blank\" rel=\"noopener\">GPT-3.5<\/a> at handling longer and more complex reasoning chains.<\/p>\n<p>But the study also found that, like all large language models, Gemini Pro particularly struggles with math problems involving several digits, and <a href=\"https:\/\/techcrunch.com\/2023\/12\/07\/early-impressions-of-googles-gemini-arent-great\/\" target=\"_blank\" rel=\"noopener\">users have found plenty of examples<\/a> of bad reasoning and mistakes. It made plenty of factual errors for simple queries like who won the latest Oscars. Google has promised improvements, but it\u2019s not clear when they\u2019ll arrive.<\/p>\n<p>Gemini Pro is also available via API in Vertex AI, Google\u2019s fully managed AI developer platform, which accepts text as input and generates text as output. An additional endpoint, Gemini Pro Vision, can process text <em>and<\/em> imagery \u2014 including photos and video \u2014 and output text along the lines of OpenAI\u2019s <a href=\"https:\/\/techcrunch.com\/2023\/09\/26\/openais-gpt-4-with-vision-still-has-flaws-paper-reveals\/\" target=\"_blank\" rel=\"noopener\">GPT-4 with Vision<\/a> model.<\/p>\n<div id=\"attachment_2648159\" style=\"width: 929px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2648159\" class=\"size-full wp-image-2648159\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png\" alt=\"Gemini\" width=\"919\" height=\"600\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png 919w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png?resize=150,98 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png?resize=300,196 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png?resize=768,501 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png?resize=680,444 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/structured_prompt.png?resize=50,33 50w\" sizes=\"auto, (max-width: 919px) 100vw, 919px\"\/><\/p>\n<p id=\"caption-attachment-2648159\" class=\"wp-caption-text\">Using Gemini Pro in Vertex AI.<\/p>\n<\/div>\n<p>Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or \u201cgrounding\u201d process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.<\/p>\n<p>Sometime in \u201cearly 2024,\u201d Vertex customers will be able to tap Gemini Pro to power custom-built conversational voice and chat agents (i.e. chatbots). Gemini Pro will also become an option for driving search summarization, recommendation and answer generation features in Vertex AI, drawing on documents across modalities (e.g. PDFs, images) from different sources (e.g. OneDrive, Salesforce) to satisfy queries.<\/p>\n<div id=\"attachment_2648157\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2648157\" class=\"size-full wp-image-2648157\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png\" alt=\"Gemini\" width=\"1024\" height=\"476\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png 2138w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=150,70 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=300,139 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=768,357 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=680,316 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=1536,713 1536w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=2048,951 2048w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=1200,557 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/01\/ais-app.png?resize=50,23 50w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"\/><\/p>\n<p id=\"caption-attachment-2648157\" class=\"wp-caption-text\"><strong>Image Credits:<\/strong> Gemini<\/p>\n<\/div>\n<p>In AI Studio, Google\u2019s web-based tool for app and platform developers, there\u2019s workflows for creating freeform, structured and chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the output\u2019s creative range and provide examples to give tone and style instructions \u2014 and also tune the safety settings.<\/p>\n<div class=\"container__access-control\">\n<h3>Gemini Nano<\/h3>\n<p>Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it\u2019s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.<\/p>\n<p>The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they don\u2019t have a signal or Wi-Fi connection available \u2014 and in a nod to privacy, no data leaves their phone in the process.<\/p>\n<p>Gemini Nano is also in Gboard, Google\u2019s keyboard app, as a <a href=\"https:\/\/developer.android.com\/gboard-smart-reply\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/developer.android.com\/gboard-smart-reply\">developer preview<\/a>. There, it powers a feature called Smart Reply, which helps to suggest the next thing you\u2019ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp, but will come to more apps in 2024, Google says.<\/p>\n<h2>Is Gemini better than OpenAI\u2019s GPT-4?<\/h2>\n<p>There\u2019s no way to know how the Gemini family <em>really <\/em>stacks up until Google releases Ultra later this year, but the company has claimed improvements on the state of the art \u2014 which is usually OpenAI\u2019s GPT-4.<\/p>\n<p>Google has several times touted Gemini\u2019s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on \u201c30 of the 32 widely used academic benchmarks used in large language model research and development.\u201d The company says that Gemini Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming and writing than GPT-3.5.<\/p>\n<p>But leaving aside the question of whether benchmarks really indicate a better model, the scores Google points to appear to be only marginally better than OpenAI\u2019s corresponding models. And \u2014 as mentioned earlier \u2014 some early impressions haven\u2019t been great, with <a href=\"https:\/\/techcrunch.com\/2023\/12\/07\/early-impressions-of-googles-gemini-arent-great\/\" target=\"_blank\" rel=\"noopener\">users <\/a>and <a href=\"https:\/\/arxiv.org\/pdf\/2312.11444.pdf\" target=\"_blank\" rel=\"noopener\">academics<\/a> pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations, and gives poor coding suggestions.<\/p>\n<h2>How much will Gemini cost?<\/h2>\n<p>Gemini Pro is free to use in Bard and, for now, AI Studio and Vertex AI.<\/p>\n<\/div>\n<p>Once Gemini Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).<\/p>\n<p>Let\u2019s assume a 500-word article contains 2,000 characters. Summarizing that article with Gemini Pro would cost $5. Meanwhile,\u00a0<em>generating\u00a0<\/em>an article of a similar length would cost $0.1.<\/p>\n<div class=\"container__access-control\">\n<h2>Where you can try Gemini?<\/h2>\n<h3>Gemini Pro<\/h3>\n<p>The easiest place to experience Gemini Pro is in <a href=\"https:\/\/techcrunch.com\/tag\/bard\/\" data-mrf-link=\"https:\/\/techcrunch.com\/tag\/bard\/\" target=\"_blank\" rel=\"noopener\">Bard<\/a>. A fine-tuned version of Pro is answering text-based Bard queries in English in the U.S. right now, with additional languages and supported countries set to arrive down the line.<\/p>\n<p>Gemini Pro is also <a href=\"https:\/\/techcrunch.com\/2023\/12\/13\/google-brings-gemini-pro-to-vertex-ai\/\" target=\"_blank\" rel=\"noopener\">accessible<\/a> in preview in Vertex AI via an API. The API is free to use \u201cwithin limits\u201d for the time being and supports 38 languages and regions including Europe, as well as features like chat functionality and filtering.<\/p>\n<p>Elsewhere, Gemini Pro can be <a href=\"https:\/\/techcrunch.com\/2023\/12\/13\/with-ai-studio-google-launches-an-easy-to-use-tool-for-developing-apps-and-chatbots-based-on-its-gemini-model\/\" target=\"_blank\" rel=\"noopener\">found<\/a> in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps \u2014 or export the code to a more fully featured IDE.<\/p>\n<p id=\"speakable-summary\"><a href=\"https:\/\/cloud.google.com\/duet-ai?hl=en\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/cloud.google.com\/duet-ai?hl=en\">Duet AI for Developers<\/a>, Google\u2019s suite of AI-powered assistance tools for code completion and generation, will start using a Gemini model in the coming weeks. And Google plans to bring Gemini models to dev tools for Chrome and its Firebase mobile dev platform around the same time, in early 2024.<\/p>\n<h3>Gemini Nano<\/h3>\n<p>Gemini Nano is on the Pixel 8 Pro \u2014 and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can <a href=\"https:\/\/docs.google.com\/forms\/d\/1OVnWGC4hlxHKFnQGRfyzHP-RtXbqHUcC5niVp1UiKgs\/edit?resourcekey=0-KyxyE7EDok7zUh3u1op1ew\" target=\"_blank\" rel=\"noopener\" data-mrf-link=\"https:\/\/docs.google.com\/forms\/d\/1OVnWGC4hlxHKFnQGRfyzHP-RtXbqHUcC5niVp1UiKgs\/edit?resourcekey=0-KyxyE7EDok7zUh3u1op1ew\">sign up<\/a>\u00a0for a sneak peek.<\/p>\n<p>We\u2019ll keep this post up to date with the latest developments.<\/p>\n<\/div><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2024\/01\/07\/what-is-google-gemini-ai\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google\u2019s trying to make waves with Gemini, a new generative AI platform that recently made its big debut. But while Gemini appears to be promising in a few aspects, it\u2019s falling short in others. So what is Gemini? How can you use it? And how does it stack up to the competition? To make it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":67123,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":{"0":"post-67122","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech"},"_links":{"self":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/67122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/comments?post=67122"}],"version-history":[{"count":0,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/posts\/67122\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media\/67123"}],"wp:attachment":[{"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/media?parent=67122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/categories?post=67122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entertainment.runfyers.com\/index.php\/wp-json\/wp\/v2\/tags?post=67122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}