Back to AI information
Google releases Gemini 2.5 Flash Live native audio preview for more natural voice conversations

Google releases Gemini 2.5 Flash Live native audio preview for more natural voice conversations

AI information Admin 32 views

Google released a preview of Gemini 2.5 Flash native audio Live in a developer update, calling it the latest iteration of the Gemini Live model, focusing on improving function calling reliability and conversational naturalness. This model processes input and output using native audio, reducing the latency and distortion associated with traditional ASR/TTS cascades. It supports interruptions and resumes during conversations, and is targeted at scenarios such as real-time voice assistants, customer service agents, and live demonstrations.

According to official documentation, the Live API supports low-latency, two-way mixed voice/video and text input. Models can trigger tool calls directly within a conversation and return structured results. This preview version is now available for trial in Google AI Studio, with simultaneous updates to the Vertex AI and Gemini API documentation. Developers can follow the Live API guide to integrate and test it. The changelog indicates that the native audio model will be available for preview on September 23, 2025.

Frequently Asked Questions

Q: What are the core improvements of Gemini Live this time?

A: The native audio model is online, function calls are more stable and accurate; voice conversations are more natural, and you can interrupt and continue the answer immediately.

Q: Where can I experience it?

A: The Live portal of Google AI Studio is now open for online trial.

Q: What inputs and outputs can the Live API handle?

A: Text, audio and video input; text and audio output, supporting real-time two-way streaming.

Q: Is this the official version?

A: This is in preview. Please refer to the official documentation and console for specific capabilities and quotas.

Q: How is it different from previous Gemini Lives?

A: Using a single native audio model reduces STT/TTS cascading, resulting in lower latency and more stable tool calling performance.

GeminiLive native audio GeminiLive2.5 Flash Preview GeminiLive function call enhancement GeminiLive real-time voice assistant GeminiLive Customer Service Seat Solution GeminiLive live demo GeminiLive low-latency conversation GeminiLive can be interrupted midway GeminiLive instant answer GeminiLive bidirectional streaming GeminiLive Audio Input and Output GeminiLive video and text mixed transmission GeminiLive tool call is stable GeminiLive structured results GeminiLiveAIStudio Trial GeminiLiveVertexAI Access GeminiLiveGeminiAPI Guide GeminiLive Developer Update GeminiLive 2025-09-23 Preview GeminiLive Changelog Highlights GeminiLiveASR_TTS cascade comparison GeminiLive reduces latency and distortion GeminiLive conversations become more natural GeminiLive function call reliability GeminiLive real-time multi-round conversation GeminiLive voice-to-text conversion without cascading GeminiLive video conferencing scenarios GeminiLive Intelligent Customer Service Integration GeminiLive toolchain trigger GeminiLiveWebhook returns GeminiLive Quotas and Limits GeminiLive Preview Description GeminiLive Access Example GeminiLiveSDK call GeminiLiveWebRTC Idea GeminiLive Microphone Permissions GeminiLive browser compatible GeminiLive Noise Cancellation and Echo GeminiLive sentence segmentation and pause processing GeminiLive prompt word design GeminiLive function schema design GeminiLive Security and Compliance GeminiLive Cost and Billing GeminiLive compared to past Live GeminiLive native audio model GeminiLive real-time subtitle generation GeminiLive command response speed GeminiLive cross-platform demo GeminiLive Development and Debugging Tips Best Practices for GeminiLive Scenarios

Recommended Tools

More