Meituan's LongCat team announced the release of LongCat-Video-Avatar in the LongCat-Video codebase update, and simultaneously launched the project page and Hugging Face weights. Based on the LongCat-Video architecture, the model supports Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), and video continuation with audio conditions, covering single-person, multi-character and long-duration content generation.
According to public materials, LongCat-Video-Avatar focuses on long sequence stability and more natural dynamic performance: Cross-Chunk Latent Stitching reduces degradation and seam problems in long video generation, and uses Reference Skip Attention to reduce "hard copy" traces while maintaining identity consistency; At the same time, a decoupling guidance strategy is proposed to reduce the over-dependence on voice signals and improve the problem of too stiff silent segments. The team cited EvalTalker as a benchmark for human evaluation in the model card and showed the comparison of naturalness and realism, but details such as external list rankings and participant size were not fully disclosed on the public page, and the relevant conclusions still need to be based on the evaluation paper and reproducible experiments.
FAQs
Q: What model is the LongCat-Video-Avatar?
A: LongCat-Video-Avatar is an audio-driven video generation model for character performance, emphasizing long-timing stability, lip-syncing, and identity consistency.
Q: What generation modes does the LongCat-Video-Avatar released by Meituan's LongCat team support?
A: LongCat-Video-Avatar supports AT2V, ATI2V, as well as video continuation and long video expansion for audio conditions.
Q: What is the difference between LongCat-Video-Avatar and InfiniteTalk?
A: LongCat-Video-Avatar emphasizes more natural dynamics and more stable long sequence performance in the introduction, and uses Reference Skip Attention to reduce the "copy-paste" artifact caused by reference image injection.
Q: What risks should developers be aware of when using LongCat-Video-Avatar?
A: Developers need to pay attention to portrait and audio licensing, compliance and content security, and avoid generating misused character content without permission.