Google released a preview of Gemini 2.5 Flash native audio Live in a developer update, calling it the latest iteration of the Gemini Live model, focusing on improving function calling reliability and conversational naturalness. This model processes input and output using native audio, reducing the latency and distortion associated with traditional ASR/TTS cascades. It supports interruptions and resumes during conversations, and is targeted at scenarios such as real-time voice assistants, customer service agents, and live demonstrations.
According to official documentation, the Live API supports low-latency, two-way mixed voice/video and text input. Models can trigger tool calls directly within a conversation and return structured results. This preview version is now available for trial in Google AI Studio, with simultaneous updates to the Vertex AI and Gemini API documentation. Developers can follow the Live API guide to integrate and test it. The changelog indicates that the native audio model will be available for preview on September 23, 2025.
Frequently Asked Questions
Q: What are the core improvements of Gemini Live this time?
A: The native audio model is online, function calls are more stable and accurate; voice conversations are more natural, and you can interrupt and continue the answer immediately.
Q: Where can I experience it?
A: The Live portal of Google AI Studio is now open for online trial.
Q: What inputs and outputs can the Live API handle?
A: Text, audio and video input; text and audio output, supporting real-time two-way streaming.
Q: Is this the official version?
A: This is in preview. Please refer to the official documentation and console for specific capabilities and quotas.
Q: How is it different from previous Gemini Lives?
A: Using a single native audio model reduces STT/TTS cascading, resulting in lower latency and more stable tool calling performance.