GLM-4.5 open-source slime: A comprehensive analysis of the efficient RL training framework

GLM-4.5 Launches Efficient RL Training Framework slime, Fully Open Source to Help Large-Scale Model Optimization

Tsinghua University Knowledge Engineering Laboratory (THUDM) officially open-sourced its self-developed efficient reinforcement learning (RL) training framework slime with the release of the GLM-4.5 series of models. This framework is designed for post-training optimization of large-scale models, with the goal of greatly improving the efficiency of inference and data generation while ensuring training effectiveness.

1. Native Integration SGLang Inference Optimization

slime is built with native SGLang integration from the beginning of the design, directly introducing SGLang's inference optimization capabilities into the training process. This approach not only reduces the switching overhead between training and inference, but also makes full use of the parallel and caching characteristics of the inference engine, speeding up the overall process of data generation and training.

2. Support synchronous and asynchronous training architecture

traditional RLHF (reinforcement learning based on human feedback) training, the data generation speed is often affected by the latency of a single node in synchronous mode. Slime separates the training engine from the environment sampling engine at the architectural level, allowing it to run in efficient synchronous mode or flexibly switch to asynchronous mode, thereby avoiding performance bottlenecks in the rollouts stage and improving GPU utilization.

3. Mixed-precision computing improves performance and stability

Slime uses FP8 (8-bit floating-point) calculation in the rollouts generation stage to reduce memory usage and improve computing speed. In the model training stage, BF16 (16-bit floating-point) accuracy is maintained to ensure the stability and convergence effect of training. This mixing accuracy strategy ensures the quality of the final output of the model while taking into account the performance.

4. Distributed Design Deeply Integrated with Megatron

slime is built on the Megatron distributed training architecture and seamlessly integrated with SGLang to ensure the scalability of distributed training while allowing inference and training processes to share optimization results. This deeply integrated design makes slime not only compatible with GLM-4.5 but also has the potential to quickly migrate among other large language models.

5. Open Source and Community Co-construction

Currently

, slime has been fully open sourced on GitHub, providing training scripts, asynchronous sample code, and detailed documentation. Developers can directly reuse the framework or develop secondary on top of it to build an RL training process that adapts to their own tasks. This initiative provides an efficient and flexible basic tool for large model optimization in academia and industry.

See the official website for details: https://github.com/THUDM/slime