- Abstract
Chroma 1.0 is an end-to-end real-time speech-to-speech model trained by FlashLabs and fully open-sourced, enabling personalized voice cloning. The model eliminates the need for traditional ASR→LLM→TTS pipelines, and can complete end-to-end responses in about 150ms, positioning it as a research-grade, landable real-time dialogue solution and serving as an open-source alternative to OpenAI's Realtime model.
- Core features
- End-to-end native voice: direct voice input to voice output, reducing delay and error accumulation.
- Real-time performance: End-to-end TTFT < 150ms, about 135ms after SGLang is enabled.
- Voice cloning: Generate high-fidelity personalized voices in just a few seconds of reference audio.
- Evaluation indicators: SIM reached 0.817, an increase of about 10.96% from the human baseline of 0.73.
- Model size: about 4B parameters, striking a balance between reasoning and dialogue ability.
- Installation
- Get the inference code from GitHub and install the dependencies.
- Download Chroma 1.0 weights through Hugging Face.
- Launch the real-time inference service using the official example or SGLang configuration.
- Typical use cases
- Real-time voice assistants and conversational robots.
- Cross-language or cross-character voice dubbing and content generation.
- Low-latency voice interaction system for conferences and customer service.
- Speech understanding and generation experiments in research scenarios.
- Ecology and competing products
- Ecosystem: Model weights, inference code, and support for SGLang inference framework.
- Competing products: Compared with OpenAI Realtime, Llama series, and multimodal voice models, Chroma 1.0's advantages lie in its fully open source and end-to-end real-time capabilities; Different solutions have their own trade-offs in terms of latency, sound quality, and computing power requirements.
- Limitations and precautions
- Real-time inference has high requirements for GPU and system optimization.
- Voice cloning involves privacy and compliance issues and requires authorization.
- The evaluation indicators are based on public benchmarks, and the actual effect needs to be verified in combination with specific scenarios.
- Project address
https://github.com/FlashLabs-AI-Chroma
- Frequently asked questions
Q: Is Chroma 1.0 fully open source?
A: Yes, both the code and model weights are open source.
Q: Is it mandatory to use SGLang?
A: No, but using SGLang further reduces latency.
Q: How long is the reference audio required for voice cloning?
A: It usually takes only a few seconds to generate high-fidelity sound.