During its annual developer conference, Google used the stage at Google I/O to introduce its latest advancements in artificial intelligence, showcasing what the company dubs its lightest and most efficient AI models yet.
Dubbed Gemini 1.5 Flash, the newest addition to the Gemini series was unveiled at Google I/O on Tuesday. In a blog post, Google highlighted the capabilities of this new model, emphasizing its ability to swiftly summarize conversations, caption images and videos, and extract data from extensive documents and tables.
Demis Hassabis, CEO of Google DeepMind, underscored the feedback from developers that drove the development of these new models. “We heard from developers that they wanted something faster and even more cost-effective,” he remarked during a press briefing.
This unveiling aligns with the broader trend of tech companies increasingly directing their product development towards generative AI, which holds significant importance for Google. These advancements provide consumers with more advanced and creative means of accessing online information compared to conventional web search methods.
Meanwhile, OpenAI made waves on Monday with the launch of a new AI model and desktop version of ChatGPT, accompanied by a revamped user interface. The new model, known as GPT-4o, boasts double the speed of its predecessor, GPT-4 Turbo, at half the cost, according to the company.
Google, on the other hand, recently announced an enhanced Gemini 1.5 Pro model, capable of making sense of extensive documents totaling 1,500 pages or summarizing 100 emails, according to a vice president overseeing Gemini. Sissie Hsiao, a vice president at Google and the general manager for Gemini experiences, revealed plans for Gemini 1.5 Pro to handle an hour of video content or codebases with over 30,000 lines. “You can quickly get answers and insights about dense documents, like figuring out the details of the pet policy in your rental agreement or comparing key arguments of multiple long research papers,” explained Hsiao.
OpenAI’s latest upgrade promises improved quality and speed, enabling ChatGPT to support 50 different languages and be accessible via OpenAI’s application programming interface (API), allowing developers to leverage the new model immediately.
Google’s Gemini 1.5 Pro boasts a 2 million token window, facilitating enhanced context understanding, local reasoning, planning, and image comprehension, according to company executives. Alphabet CEO Sundar Pichai highlighted its longest context window yet during the press briefing, illustrating its utility with scenarios like summarizing recent emails from a child’s school.
Initially, Gemini 1.5 Pro will undergo testing in Workspace Labs, while Gemini 1.5 Flash will be available for testing and deployment in Vertex AI, Google’s machine learning platform for training and deploying AI applications.