On September 29, 2025, DeepSeek released the experimental model DeepSeek-V3.2-Exp , debuting DeepSeek Sparse Attention (DSA) on top of V3.1-Terminus . Officially, DSA uses fine-grained sparse attention to significantly improve the efficiency of long-context training and inference while minimizing output quality. Internal comparisons show that its overall performance is comparable to V3.1-Terminus. The model is now available simultaneously on the app, web, and API .
The accompanying documentation indicates that deepseek-chat and deepseek-reasoner have switched to V3.2-Exp . The pricing page has also been updated, announcing a 50%+ reduction in API prices . The current specification remains at 128KB context , and the default and maximum output limits for non-thinking and thinking modes remain unchanged. The price reduction applies to both cache hit and cache miss input prices, as well as output prices.
Frequently Asked Questions
Q: What are the core changes in V3.2-Exp?
A: We introduced DSA sparse attention, focusing on the computational efficiency of long contexts, with the goal of reducing costs without sacrificing quality.
Q: What is the relationship with V3.1-Terminus?
A: This is an experimental version that is iterated on. The benchmark performance is roughly the same, and the focus is on efficiency improvements.
Q: Where is it available now?
A: It is now available on DeepSeek's app, web version, and API.
Q: Will the context and output limits change?
A: The context is still 128KB; the default and maximum output lengths for non-thinking/thinking modes are the same as before.
Q: How much will the price drop?
A: The pricing page lists new tiers (e.g., input cache hit $0.028/million tokens, miss $0.28, output $0.42), which is an overall decrease of 50%+ compared to the previous level. Please refer to the official page for details.