I. Summary
Qwen3Guard is an open-source security protection system launched by the Alibaba Cloud Qwen team, designed to improve the security of large language models during both inference and output. The system comprises the Qwen3-4B-SafeRL reinforcement learning alignment model and the Qwen3GuardTest evaluation benchmark. The Qwen3-4B-SafeRL model leverages security feedback from Qwen3Guard-Gen-4B for reinforcement learning training, improving the safety rating on the WildJailbreak benchmark from 64.7% to 98.1% without sacrificing general-purpose performance. The Qwen3GuardTest covers two scenarios: "Think Chain Reasoning Security Classification" and "Streaming Generation Review," providing researchers with a standardized testing framework.
2. Core Features
- Safe Reinforcement Learning (SafeRL): Combines safety feedback signals with a hybrid reward mechanism to balance safety, usefulness, and rejection rate.
- Intermediate reasoning protection: Qwen3GuardTest supports the security classification and screening of model chain-of-thought content.
- Streaming output monitoring: The Guard-Stream model can perform dynamic risk identification at the token generation stage.
- Multilingual security coverage: supports security classification and detection in 119 languages and dialects.
- Reproducible evaluation framework: Open datasets and indicator systems make it easier for researchers to conduct model security alignment experiments.
3. Installation
- Model loading
pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-SafeRL")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-SafeRL")
- Evaluation Dataset
from datasets import load_dataset
ds = load_dataset("Qwen/Qwen3GuardTest")
- Reasoning compatibility: Supports SGLang (≥0.4.6.post1) and vLLM (≥0.8.5), and can access the OpenAI API interface.
Typical Use Cases
- Security alignment research: Analyze the effects and tradeoffs of reinforcement learning in security optimization.
- Real-time review system: Combined with the Guard-Stream model, it performs token-by-token inspection on streaming output.
- Enterprise deployment: Provide a security layer for chatbots and content generation platforms.
- Academic evaluation: Use Qwen3GuardTest to conduct a unified security comparison of different architecture models.
5. Ecosystem and Competitive Products
- Ecosystem: Compatible with the Qwen3 mainline model system, it can be directly used for security reinforcement of Qwen3-4B, 7B, 72B and other versions.
- Competitors: Compared with solutions such as OpenAI Moderation and Anthropic Constitutional AI, Qwen3Guard provides more fine-grained control in intermediate inference protection and streaming monitoring.
VI. Limitations and Precautions
- SafeRL training requires a lot of computing resources and has high hardware requirements.
- Qwen3GuardTest is currently mainly in English, and its multilingual performance needs further verification.
- Reinforcement learning alignment may lead to slight performance fluctuations in extreme tasks.
- Excessive security constraints may lead to the phenomenon of "too many rejections", and policy parameters need to be weighed.
7. Project Address
https://github.com/QwenLM/Qwen3Guard
8. Frequently Asked Questions
Q: What is the difference between Qwen3-4B-SafeRL and ordinary RLHF models?
A: SafeRL takes safety feedback as its core optimization objective and strikes a balance between safety and usefulness through hybrid rewards.
Q: Is the Qwen3GuardTest applicable to non-Qwen series models?
A: Yes, the benchmark data and metrics are designed to be universal and can be used to evaluate the security performance of other language models.
Q: Can the SafeRL model be used offline?
A: You can load Hugging Face or ModelScope weights locally and run it offline.
Q: Can Guard-Stream interrupt risk output in real time?
A: Each token can be classified in real time during the inference phase, and the output can be immediately blocked or replaced when risks are discovered.