I. Basic Information
Captions is an AI video creation and editing platform powered by Mirage. Its core features include AI video editing, text-to-video automatic subtitles, lip-syncing and multilingual dubbing, digital human generation, and eye-tracking correction. The product emphasizes a complete creation process from script to finished product on mobile and web platforms, catering to short video creators, brands and e-commerce teams, educational and training institutions, and content studios requiring scalable output.
II. Product Overview
Captions organizes workflows using a hybrid conversational and timeline editing approach. Users can directly record or upload materials, or quickly generate draft shots using AI Creator and scripts. The system provides subtitle generation and style management, automatic dubbing and background music, as well as multilingual translation and dubbing alignment. For presentation and on-screen content, platforms offer Eye Contact eye-tracking correction, Denoise noise reduction, Zoom intelligent advancement, and a Title Card template library, completing key processing without complex manual editing. For users who need on-screen actors but find filming inconvenient, Captions provides AI Twins and Mirage to generate actors, batch-generating videos with lip-sync and voice matching while preserving individual appearances or selecting images from the library.
III. Core Functions
1. Main functions
Automatic subtitles support multilingual transcription and style editing.
Lipdub synchronizes lip movements with voiceovers, allowing for the matching of speech patterns with different languages.
AI Twins generates personal personas that combine voice and visual elements to create explanatory videos.
Mirage generates actors with standardized presentation images and multiple voice styles.
AI Creator and Chat to Edit allow you to generate and modify footage using prompts or dialogue.
Eye Contact and Denoise improve visual experience.
AI Ads and AI Shorts templates are adapted to major platforms.
Script generation and teleprompter facilitate one-shot recording.
2. Technical characteristics
Multilingual speech synthesis and translation pipeline supports aligned subtitles and lip movements.
Lens-level AI editing supports one-click jump cuts, automatic advancement, and transition suggestions.
Mobile-first and cloud-based collaboration: sharing projects and history between mobile phones and web browsers.
Model selectability and concurrency control are supported in higher-level versions, along with concurrency generation and model switching.
IV. Pricing and Versions
Captions offers a free plan and multiple subscription plans: Pro $9.99/month, Max $24.99/month, and Scale $69.99/month. Different plans differ in project creation, export watermark limitations, model selection, the number of AI Twins generated, and the availability of generated actors. Specific pricing and features are subject to change based on the official website and help center. Amounts and availability may vary by region and timeframe.
V. Applicable Scenarios and Target Audience
Short video creators can quickly complete daily content updates with the help of automatic subtitles, lip-syncing, and templates.
Brands and e-commerce teams use AI Ads and multilingual voiceovers to generate ad creatives in batches.
Education and training utilize script generation and eye-correction to output course micro-lessons and instructional videos.
International operations expand to multiple regions while maintaining stylistic consistency through translation and dubbing alignment.
Media and content studios leverage concurrent generation and model switching to improve the efficiency of parallel multi-project operations.
VI. Frequently Asked Questions
Q: What are the differences between AI Twins and generated actors in Captions?
A: AI Twins uses the user's own image as a basis, making it suitable for generating a unified personal brand. Actors are sourced from an official library, suitable for scenarios where it's inconvenient for users to appear on camera or where multiple roles are required.
Q: How does the lip-sync function in Captions work?
A: The platform establishes a time alignment between voice-over and visuals. Through lip-sync estimation and speech alignment technology, it ensures that voice-overs in different languages maintain consistent lip movements, making it suitable for multilingual releases.
Q: Does it support recording and publishing entirely on mobile devices?
A: Yes, it supports recording and exporting teleprompter subtitles on your mobile phone and keeping them synchronized with the cloud on the web version.
Q: What are the differences between the free and paid versions?
A: This is mainly reflected in whether the quota model is exported with watermarks, the availability of concurrent generation permissions AI Twins, the availability of actors, and the availability of advanced tools such as advertising and short film templates.
Q: What are the application scenarios for Eye Contact and Denoise?
A: Eye Contact is used for post-production eye correction to enhance the professional look of looking directly at the camera. Denoise is used to reduce ambient noise and enhance vocal clarity, suitable for casual recording and indoor environments.