Back to AI information
Qwen3.5-Omni released: From long-term audio and video understanding to real-time voice and video interaction

Qwen3.5-Omni released: From long-term audio and video understanding to real-time voice and video interaction

AI information Admin 103 views

Qwen3.5-Omni has been officially released by Qwen. Qwen Chat's experience portal has pointed to VoiceChat and VideoChat. It presses the listening, reading, searching and debugging tools into a round of interaction, but the specific model and open scope need to be checked.

First, this upgrade not only knows how to look at the pictures

This time, the official split the capabilities into offline and real-time lines. The offline side focuses on script-level subtitles, which can generate video scripts with timestamps, shot switching and speaker mapping; on the real-time side, fine-grained voice control, web search and complex function calls are put into the same set of interactions.

Second, the most eye-catching thing is audio and video vibe coding

The official puts Audio-Visual Vibe Coding very far ahead. The core demonstration is to talk about needs in front of the camera. Qwen3.5-Omni-Plus directly generates runnable web pages or Mini games. External data also gives several hard indicators, including up to 10 hours of audio, 400 seconds of 720p video, 113 speech recognition languages or dialects, and 36 speech generation languages or dialects. The family is divided into Plus, Flash, and Light. Three files.

  1. How to check whether this ability is your turn?

Go to Qwen Chat to see if VoiceChat or VideoChat is already in the lower right corner, and then go to the development document to confirm whether the Offline API and Realtime API entrances are visible. If the webpage can directly open real-time voice or video, and the console can also call the corresponding model, it basically means that these capabilities are already available to the public.

  1. It is of great value, but the boundaries must be clearly understood.

The most practical significance of this set of capabilities is not that a single question and answer is more dazzling, but that voice assistants, video understanding, meeting processing and front-end prototypes begin to enter continuous collaboration. It should be noted that the official propaganda uses the Qwen3.5-Omni family caliber, but the public API documents are currently more clear about the Qwen-Omni, Qwen3-Omni-Flash, and Realtime series, and voice cloning is still gradually being engineered and expanded.

Recommended Tools

More