Back to AI is open source
LongCat-Audio-Codec Open Source: An Extremely Low-Bitrate Audio Codec for Large Speech Models

LongCat-Audio-Codec Open Source: An Extremely Low-Bitrate Audio Codec for Large Speech Models

AI is open source Admin 121 views

I. Summary

LongCat-Audio-Codec is an open-source audio codec solution developed by the Meituan LongCat team, optimized for the Speech Large Scale Model (LLM). This project utilizes a dual-token architecture to concurrently model semantic and acoustic information, maintaining speech intelligibility and quality at an ultra-low bitrate of just 0.43 kbps. Its real-time streaming decoder maintains latency in the hundreds of milliseconds, supporting voice interaction and embedded deployment. The decoder's integrated super-resolution module further enhances sound quality without requiring additional models, significantly reducing the resource overhead of end-to-end speech systems.

2. Core Features

1. Dual-Token Parallel Encoding : Simultaneously extracts semantic and acoustic tokens, achieving efficient feature modeling at a low frame rate of 16.7 Hz (60 ms).

2. Extremely low bitrate and high-fidelity reconstruction : Maintains high intelligibility at 0.43 kbps, significantly improving bandwidth utilization.

3. Real-time low-latency decoding : Using a streaming architecture, the overall latency is maintained at hundreds of milliseconds, meeting the needs of real-time speech generation and interaction.

4. Decoding-side super-resolution enhancement : An integrated super-resolution module improves sound quality details without the need for an external model.

5. Lightweight and mobile optimization : Architectural optimization to address the computing power limitations of embedded and mobile devices.

3. Installation

1. Clone repository: git clone https://github.com/meituan-longcat/LongCat-Audio-Codec

2. Installation dependency: pip install -r requirements.txt

3. Load the model: You can download the corresponding weights of meituan-longcat/LongCat-Audio-Codec through Hugging Face.

  1. Run the example: Execute the inference script in the repository to perform encoding and decoding verification.

Typical Use Cases

  1. Front-end compression of large speech models: reducing input bandwidth while maintaining intelligibility.
  2. Real-time voice interaction system: Achieve low-latency transmission in conversational AI or voice assistants.
  3. Speech synthesis on edge and mobile devices: Generate or decode speech locally.
  4. Long-distance voice communication: Maintain clear voice transmission quality in extremely low-bandwidth environments.

5. Ecosystem and Competitive Products

1. Ecosystem Integration : LongCat-Audio-Codec is part of the Meituan LongCat series ecosystem, and works collaboratively with models such as LongCat-Flash to optimize speech generation and understanding.

2. Comparison with competitors : Compared with neural codec solutions such as SemantiCodec, UniCodec, and LMCodec, LongCat-Audio-Codec achieves lower bit rates and stronger real-time performance in the voice field.

3. Industry significance : Lowers the deployment threshold of voice LLM and provides infrastructure support for mobile AI assistants and voice services.

VI. Limitations and Precautions

  1. Even at extremely low bit rates, the sound quality may still suffer from loss of details.
  2. Streaming decoding has high requirements for hardware real-time performance.
  3. Different model versions may have a trade-off between latency and sound quality.
  4. Integrating a super-resolution module will increase the computational burden.

7. Project Address

https://github.com/meituan-longcat/LongCat-Audio-Codec

8. Frequently Asked Questions

Q: Does LongCat-Audio-Codec support offline deployment?

A: It can be run completely offline, but you need to prepare the corresponding model weights and dependent environment.

Q: How to integrate this codec on mobile devices?

A: It can be ported to mobile or embedded platforms through quantized models or lightweight inference frameworks.

Q: Can it be used for non-speech audio?

A: The current version is mainly optimized for voice tasks, and other types of audio require additional training.

LongCat-Audio-Codec Open Source LongCat-Audio-Codec LongCat-Audio-Codec Dual Token LongCat-Audio-Codec semantic acoustic parallel LongCat-Audio-Codec0_43kbps LongCat-Audio-Codec Ultra-Low Bitrate LongCat-Audio-Codec high intelligibility LongCat-Audio-Codec real-time streaming decoding LongCat-Audio-Codec 100-millisecond delay LongCat-Audio-Codec Super Resolution Decoder LongCat-Audio-Codec Sound Quality Enhancement LongCat-Audio-Codec mobile optimization LongCat-Audio-Codec Embedded Deployment LongCat-Audio-Codec Voice LLM Front-end LongCat-Audio-Codec Bandwidth Compression LongCat-Audio-Codec End-to-End Voice LongCat-Audio-Codec16_7Hz frame rate LongCat-Audio-Codec 60ms frame interval LongCat-Audio-Codec streaming interaction LongCat-Audio-Codec low computing power adaptation LongCat-Audio-CodecMeituanLongCat LongCat-Audio-Codec and LongCat-Flash Collaboration LongCat-Audio-Codec vs. SemantiCodec LongCat-Audio-Codec vs. UniCodec LongCat-Audio-Codec vs. LMCodec LongCat-Audio-CodecHuggingFace Weight LongCat-Audio-Codec GitHub repository LongCat-Audio-Codec Installation Guide LongCat-Audio-Codec Inference Example LongCat-Audio-Codec Speech Synthesis LongCat-Audio-Codec Voice Interaction LongCat-Audio-Codec Remote Voice Communication LongCat-Audio-Codec Edge Computing LongCat-Audio-Codec offline deployment LongCat-Audio-Codec Quantization Deployment LongCat-Audio-Codec lightweight model LongCat-Audio-Codec real-time optimization LongCat-Audio-Codec client-side voice LongCat-Audio-Codec resource overhead reduction LongCat-Audio-Codec Low Bit Rate Reconstruction LongCat-Audio-Codec High-Fidelity Reconstruction LongCat-Audio-Codec LongCat-Audio-Codec Voice Decoder LongCat-Audio-Codec Ultra-Small Bandwidth LongCat-Audio-Codec sound quality and intelligibility LongCat-Audio-CodecSDK integration LongCat-Audio-CodecAPI Example LongCat-Audio-Codec real-time transmission LongCat-Audio-Codec End-Cloud Collaboration LongCat-Audio-Codec Application Scenarios

Recommended Tools

More