The Alibaba Tongyi team announced the release of Qwen3-TTS (including the Qwen3-TTS-Flash variant), a next-generation text-to-speech model. This model features multi-timbre, multi-language, and multi-dialect synthesis, emphasizing more natural and expressive speech output. Official demos and blog posts demonstrate the model's outstanding performance in both English and Chinese scenarios. A new unified architecture supports multi-language and multi-dialect support within the same model. An online demo and access instructions are now available.
The accompanying product documentation and console page indicate that Qwen3-TTS-Flash offers 17 anthropomorphic voices, can output multiple languages and dialects (including Mandarin and some other dialects) using the same voice, and provides API billing specifications. It also offers a real-time speech synthesis option (Qwen3-TTS Realtime) to reduce end-to-end latency. Media reports also juxtaposed the same-day release of Qwen3-TTS with that of Qwen3-Omni, emphasizing that they constitute key updates to the Tongyi multimodal family.
Frequently Asked Questions
Q: What are the core features of Qwen3-TTS?
A: It integrates multiple tones, languages, and dialects, emphasizes the naturalness and expressiveness of English and Chinese, and provides online demonstrations and API access.
Q: What is the difference with Qwen-TTS?
A: The official documentation recommends using Qwen3-TTS, which covers a wider range of tones and languages (including multiple dialects) and is available in Flash and Realtime formats.
Q: Is the weight open source?
A: Currently, API and online demo are mainly used, and their weight is not disclosed. Please refer to the official interface and console for usage.
Q: What languages/dialects and tones are supported?
A: The document provides 17 tones, covering Chinese (including some dialects) and multiple foreign languages; see the product page for a detailed list and price.
Q: Where can I experience and get updates?
A: You can experience it on the official blog/demo page, and view the model and real-time voice options in the Alibaba Cloud Tongyi Qianwen product documentation.