In a sweeping update announced on Tuesday at Google I/O, Google’s AI division DeepMind introduced major upgrades across its line of Gemini language models, including a new lightweight offering called Gemini 1.5 Flash designed for low-latency, cost-effective deployment at scale.
As detailed in a blog post penned by DeepMind CEO Demis Hassabis, the blazing fast 1.5 Flash model was optimised for high-volume tasks and multimodal reasoning over long sequences of up to 1 million tokens. Though lighter than the flagship 1.5 Pro model, Flash promises impressive performance for applications like summarisation, chat, image/video captioning and data extraction from documents and tables.
Flash’s capabilities were distilled from the larger 1.5 Pro through a process of “knowledge transfer”, with DeepMind claiming strong benchmark gains for the Pro edition across reasoning, coding, vision and multimodal understanding tasks. In particular, 1.5 Pro now matches state-of-the-art results on challenges like MMMU, AI2D, MathVista and various visual question answering datasets.
Beyond extending context to a staggering 2 million tokens, 1.5 Pro’s updates focused on improving code generation, logical planning, multi-turn dialogue and controllability via system instructions defining aspects like persona and response style. Audio understanding in the Gemini API and Google AI Studio was also added.
On the open source front, DeepMind previewed Gemma 2 – the latest iteration of its publicly released model family – boasting a new architecture for heightened performance and efficiency across multiple sizes. The Gemma line furthermore expanded with the vision-language PaliGemma model.
Perhaps most intriguingly, Hassabis shared details on “Project Astra” – DeepMind’s vision for virtual AI assistants that can naturally understand multimodal context and engage in fluid conversation. Built atop Gemini and specialised models, Astra agents continuously encode video, combine it with speech input and cache this timeline for fast retrieval. They also leverage state-of-the-art speech synthesis for more expressive interaction.
With some Astra capabilities landing in products like the Gemini app later this year, Google aims to eventually deliver an “expert assistant by your side” via mobile devices and augmented reality glasses. As Hassabis proclaimed, the tech giant is relentlessly innovating to push AI frontiers and unlock new Gemini use cases.