Google has officially launched Gemini 2.5 Flash-Lite, marking a significant advancement in making powerful AI more accessible and efficient for developers and enterprises alike.
The new model, described as Google's "most cost-efficient and fastest 2.5 model yet," is optimized for high-volume, latency-sensitive operations. Flash-Lite enters the market with the lowest latency and cost in the 2.5 model family, designed as a cost-effective upgrade from previous 1.5 and 2.0 Flash models. It offers better performance across most evaluations, with lower time to first token while achieving higher tokens per second decode, making it ideal for high throughput tasks like classification or summarization at scale.
As a reasoning model, Flash-Lite allows for dynamic control of the thinking budget through an API parameter. Unlike other Gemini 2.5 models where thinking is enabled by default, Flash-Lite optimizes for cost and speed by keeping thinking turned off unless specifically enabled. Despite this optimization, it still supports all native tools including Google Search grounding, code execution, URL context, and function calling.
Performance tests show Flash-Lite is 1.5 times faster than Gemini 2.0 Flash at a lower cost, making it particularly well-suited for tasks such as classification, translation, intelligent routing, and other cost-sensitive, high-scale operations. While other models may default to more powerful (and expensive) reasoning tools to answer questions, Flash-Lite gives developers control over this process. Users can toggle the thinking capability on or off depending on their specific needs, and despite its cost efficiency, Flash-Lite isn't limited in what it can accomplish.
The preview of Gemini 2.5 Flash-Lite is now available in Google AI Studio and Vertex AI, alongside the stable versions of 2.5 Flash and Pro. Both 2.5 Flash and Pro are also accessible in the Gemini app, and Google has brought custom versions of 2.5 Flash-Lite and Flash to Search.
This strategic expansion of the Gemini model family represents Google's commitment to democratizing AI by providing options that balance performance, cost, and speed for different use cases, from complex reasoning tasks to high-volume data processing.