·May 14, 2024·2 min read

Google unveils Veo for HD video, Imagen 3 for photorealistic images

Google took major steps to put generative AI into the hands of creators today, debuting two powerful new models at Google I/O called Veo for generating HD video and Imagen 3 for photorealistic image creation from text prompts. The company also highlighted music collaborations tapping its latest AI audio tools.

Veo represents Google’s most advanced video generation model yet. It can produce impressively sharp 1080p videos over a minute long across a variety of visual styles and cinematic techniques like timelapses and aerial shots.

“Veo understands natural language prompts and can generate video closely matching the user’s creative vision,” said Eli Collins, Vice President of Product Management. Importantly, it creates temporally coherent footage with realistic movement.

The model builds on Google’s extensive research into areas like the Generative Query Network and DVD-GAN, combining new architectures and scaling methods. Collins said Google has been working directly with select creators like filmmaker Donald Glover to refine Veo for real-world use cases.

A private preview of Veo’s capabilities is rolling out now in Google’s VideoFX tool, with planned integrations for YouTube Shorts and other products down the line.

For image creation, Google introduced Imagen 3 as its highest fidelity text-to-image model so far. The company said Imagen 3 generates “photorealistic, lifelike images with far fewer visual artifacts” compared to previous iterations.

Key upgrades include improved language understanding to properly capture nuanced prompts, plus an ability to render legible text – a challenge for prior image generation models. Google said these allow Imagen 3 to span a wide range of styles.

A private Imagen 3 preview is available initially through Google’s ImageFX, ahead of integration into the company’s Vertex AI platform for broader access.

In the music realm, Google highlighted collaborations with artists Wyclef Jean, Marc Rebillet and Justin Tranter. The musicians have been experimenting with Google’s AI Music Sandbox suite of tools for audio generation and processing.

All three released demo songs created using the tools, which leverage Google’s latest AI models for tasks like music composition, transformation and synthesis. The collaborations aim to further refine the experimental tools based on real-world creator feedback.

Across all these initiatives, Google said it has been working closely with creative professionals and taking steps to improve safety and responsible deployment from the ground up. This includes techniques like digital watermarking to track the provenance of AI-generated content.

“We’ve been conducting safety tests, applying filters and guardrails, and putting safety teams at the center of development,” a Google spokesperson said. “We’re also pioneering tools like SynthID for embedding imperceptible watermarks in AI audio, images, text and video.”

While Google is clearly betting big that generative AI will democratise creative capabilities, it acknowledges the need to carefully manage the potential risks as these powerful models go into wider use.

The new creative tools pit Google against rivals like OpenAI, Anthropic and Midjourney in the intensifying generative AI race. By revealing new technical achievements while emphasising safety work, Google hopes to establish itself as a leading, principled platform for responsibly deploying the technology.

Ultimately, the AI creator tools’ success will hinge on whether professionals embrace them as productivity boosters versus viewing them as potential replacements. Getting that balance right will shape the future impact of generative AI on human creativity.