Alibaba released a new generation of Tongyi Wanxiang 2.6 (Wan 2.6) series models, which are upgraded for professional film and television production and image creation scenarios, and have simultaneously launched Alibaba Cloud Bailian and Wanxiang related entrances. The new version focuses on "role-playing" and multi-camera narrative: you can refer to the appearance and timbre of the characters in the input video, generate single, multi-person or co-shot videos with people and objects according to the prompts, and expand the simple prompts into multi-shot scripts to keep the cross-camera subject and scene consistent as much as possible.
In terms of ability details, Wanxiang 2.6 emphasizes natural sound and picture synchronization and more stable multi-person dialogue, taking into account music and song generation; video generation can be up to 15 seconds (some reference generation forms are marked as 10 seconds), and supports "audio drive" to drive multi-camera interpretation with text and audio. The Alibaba Cloud page also provides billing tips on the API side, and the price of the relevant video model call is marked from 0.6 yuan/second, and the actual cost, quota, and available capacity are subject to the platform console and product description.
With the enhancement of controllable storyboards, character appearance and voice migration capabilities, the creative threshold has indeed been further lowered, but more attention needs to be paid to portrait rights, sound rights and copyright compliance. When using real character images, voice acting, or branding elements, it is recommended to ensure authorized and traceable sources to avoid the risk of infringement or misleading communication.
FAQ
Q: What is Wan 2.6?
A: It is a new generation of image and video generation model series under the Alibaba Tongyi system, which has upgraded its capabilities for film and television creation.
Q: What is the "role-playing function" in Wanxiang 2.6?
A: The model can refer to the appearance and timbre of the characters in the input video, and then generate video content for single people, multiple people, or people and objects according to the prompts.
Q: How to use the "multi-lens narrative" and "smart storyboard" of Wanxiang 2.6?
A: After entering a simple prompt, the model can generate a multi-storyboard script and produce a coherent video with multiple shots while maintaining consistency across shots as much as possible.
Q: How long can Wanxiang 2.6 generate a maximum of videos, and does it support audio and video synchronization?
A: The public video information shows that it can generate up to 15 seconds of video, and emphasizes the stable generation of multi-person dialogue and a more natural sound and picture synchronization effect.
Q: What are the usage costs and risk points of Wanxiang 2.6?
A: The platform page marks some API call prices starting from 0.6 yuan/second; When using the company, pay attention to quotas and billing rules, and pay attention to portrait rights, sound rights, and copyright authorization.