Wan 2.5 has added "native audio-driven video generation" to the preview version. Users can directly provide audio as a control signal, combined with text prompts or reference images, for text-to-video and image-to-video tasks. The official description emphasizes audio and video synchronization capabilities, and the preview phase includes 1080p, 24fps output specifications, with a maximum video length of 5 or 10 seconds, depending on the selected model and interface parameters. This update aims to allow narration, music, or ambient sound to determine the rhythm and narrative direction of the shots, resulting in more coherent short film generation.
Alibaba Cloud Bailian and its product website also note that Wan 2.5's "Video with Sound" preview supports automatic dubbing or custom audio file input, making it suitable for scenarios such as advertising, e-commerce demonstrations, and creative short films. Since it's in the preview phase, functionality and availability may be gradually expanded across platforms and regions, and specific performance still needs to be verified in conjunction with the footage and downstream processes. Third-party evaluations also indicate that performance still fluctuates in portrait and motion stability, so small sample test footage evaluation is recommended for each project.
Frequently Asked Questions
Q: How is audio involved in generation?
A: You can upload audio as a driving signal and combine it with text prompts or reference images to guide the shot rhythm, emotion and lip sync.
Q: What length and specifications are supported?
A: The preview interface provides two settings: 5 seconds and 10 seconds, fixed at 24fps, up to 1080p, and can be exported to MP4 (H.264).
Q: Which entrances are available?
A: The Tongyi Wanxiang/Wan product page and Alibaba Cloud Bailian API have listed preview models with audio capabilities and parameter descriptions.
Q: How is the stability?
A: The official demo shows that audio and video synchronization is available, but third-party evaluations say that the consistency of portraits and motion still fluctuates and needs to be tested according to the scene.
Q: Commercial and regional availability?
A: This is a preview feature. The scope and terms of activation are subject to the platform pages and account permissions of each platform, and may be gradually increased by region.