Google DeepMind has introduced Gemini Diffusion, a revolutionary approach to AI text generation that represents a significant departure from traditional autoregressive language models.
Unlike conventional models that generate text one token at a time in sequence, Gemini Diffusion employs diffusion technology—previously used primarily in image and video generation—to refine random noise into coherent text through an iterative process. This novel approach enables the model to generate content at remarkable speeds of up to 2,000 tokens per second, according to DeepMind researchers.
"Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step," explains Google in its announcement. "This means they can iterate on a solution very quickly and error correct during the generation process."
The experimental demo, currently available via waitlist, demonstrates how this technology can match the coding performance of Google's existing models while dramatically reducing generation time. In benchmarks, Gemini Diffusion performs comparably to Gemini 2.0 Flash-Lite on programming tasks like HumanEval and MBPP, showing nearly identical results.
Oriol Vinyals, VP of Research and Deep Learning Lead at Google DeepMind and Co-Head of the Gemini project, described the release as a personal milestone, noting that the demo ran so fast they had to slow down the video to make it watchable.
In parallel, Google has enhanced its Gemini 2.5 lineup with new capabilities. The company launched Gemini 2.5 Flash with thinking budgets, giving developers unprecedented control over how much reasoning their AI performs. This feature allows users to balance quality, latency, and cost by setting a token limit (up to 24,576 tokens) for the model's reasoning process.
Google is also extending thinking budgets to Gemini 2.5 Pro, with general availability coming in the next few weeks. Additionally, the company has added native SDK support for Model Context Protocol (MCP) definitions in the Gemini API, making it easier to integrate with open-source tools and build agentic applications.
These advancements collectively represent Google's push to make AI more efficient, controllable, and accessible for developers while maintaining high performance standards.