Google has taken AI video generation to new heights with the release of Veo 3, a groundbreaking model that adds audio capabilities to AI-generated videos for the first time.
Unveiled at Google I/O 2025 in May, Veo 3 represents a significant advancement over previous AI video generators by incorporating synchronized dialogue, ambient sounds, and background music directly into generated clips. "For the first time, we're emerging from the silent era of video generation," said Demis Hassabis, CEO of Google DeepMind, during the announcement.
The technology excels at creating realistic videos with accurate physics, precise lip-syncing, and natural motion. Users can generate videos by providing text descriptions or image references, with the AI automatically adding appropriate audio elements that match the visual content. This capability sets Veo 3 apart from competitors like OpenAI's Sora, which currently lacks native audio generation.
Alongside Veo 3, Google has enhanced its popular Veo 2 model with several powerful new features. These include reference-powered video for consistent characters and objects, advanced camera controls for cinematic movement, outpainting to extend video frames beyond original borders, and intelligent object addition and removal functionality.
To showcase the creative potential of these tools, Google has introduced Flow, a new AI filmmaking platform that combines Veo, Imagen, and Gemini models. Several filmmakers have already created professional-quality short films using the technology, including Henry Daubrez's emotional sci-fi story "Kitsune" and Junie Lau's exploration of identity in "Dear Stranger."
Veo 3 is currently available to US subscribers of Google's $249.99 monthly AI Ultra plan through the Gemini app, as well as to enterprise users via Google's Vertex AI platform. The technology includes SynthID watermarking to help identify AI-generated content and address concerns about deepfakes and misinformation.