Chroma 1.0 released: the world's first open-source end-to-end real-time speech-to-speech model

AI is open source • Admin • 1/22/2026 • 66 views

Abstract

Chroma 1.0 is an end-to-end real-time speech-to-speech model trained by FlashLabs and fully open-sourced, enabling personalized voice cloning. The model eliminates the need for traditional ASR→LLM→TTS pipelines, and can complete end-to-end responses in about 150ms, positioning it as a research-grade, landable real-time dialogue solution and serving as an open-source alternative to OpenAI's Realtime model.

Core features
End-to-end native voice: direct voice input to voice output, reducing delay and error accumulation.
Real-time performance: End-to-end TTFT < 150ms, about 135ms after SGLang is enabled.
Voice cloning: Generate high-fidelity personalized voices in just a few seconds of reference audio.
Evaluation indicators: SIM reached 0.817, an increase of about 10.96% from the human baseline of 0.73.
Model size: about 4B parameters, striking a balance between reasoning and dialogue ability.
Installation
Get the inference code from GitHub and install the dependencies.
Download Chroma 1.0 weights through Hugging Face.
Launch the real-time inference service using the official example or SGLang configuration.
Typical use cases
Real-time voice assistants and conversational robots.
Cross-language or cross-character voice dubbing and content generation.
Low-latency voice interaction system for conferences and customer service.
Speech understanding and generation experiments in research scenarios.
Ecology and competing products
Ecosystem: Model weights, inference code, and support for SGLang inference framework.
Competing products: Compared with OpenAI Realtime, Llama series, and multimodal voice models, Chroma 1.0's advantages lie in its fully open source and end-to-end real-time capabilities; Different solutions have their own trade-offs in terms of latency, sound quality, and computing power requirements.
Limitations and precautions
Real-time inference has high requirements for GPU and system optimization.
Voice cloning involves privacy and compliance issues and requires authorization.
The evaluation indicators are based on public benchmarks, and the actual effect needs to be verified in combination with specific scenarios.
Project address

https://github.com/FlashLabs-AI-Chroma

Frequently asked questions

Q: Is Chroma 1.0 fully open source?

A: Yes, both the code and model weights are open source.

Q: Is it mandatory to use SGLang?

A: No, but using SGLang further reduces latency.

Q: How long is the reference audio required for voice cloning?

A: It usually takes only a few seconds to generate high-fidelity sound.

Chroma 1.0 released: the world's first open-source end-to-end real-time speech-to-speech model

Related Articles

Andrew Bosworth revealed that Meta is testing a new AI model, which is interpreted as a key milestone

New features in Cursor 2.4: Agents can ask clarification questions while working, and support generating images and writing assets

Is Mem0 worth integrating with an agent? Long-term memory is useful, but you need to manage boundaries

What kind of team is Haystack suitable for? It is more like a composable RAG engineering framework

Recommended Tools

Chroma 1.0 released: the world's first open-source end-to-end real-time speech-to-speech model

Related Articles

Andrew Bosworth revealed that Meta is testing a new AI model, which is interpreted as a key milestone

New features in Cursor 2.4: Agents can ask clarification questions while working, and support generating images and writing assets

Is Mem0 worth integrating with an agent? Long-term memory is useful, but you need to manage boundaries

What kind of team is Haystack suitable for? It is more like a composable RAG engineering framework

Recommended Tools

Submit AI Tool

Please confirm submission information