Back to AI information
DeepSeek opens DSpark acceleration components: the model hasn't changed, but why does the generation speed improve?

DeepSeek opens DSpark acceleration components: the model hasn't changed, but why does the generation speed improve?

AI information Admin 5 views

On June 28, 2026, the official DeepSpec repository for DeepSeek was updated and added to the DSpark checkpoint, providing speculative decoding support for DeepSeek-V4-Flash and V4-Pro. The official explanation is that DSpark is not a new model, but rather a draft module for "advance guessing" has been added next to the existing model, aiming to shorten generation wait times without changing the output distribution of the main model.

How it makes the same model run faster

Conventional autoregressive generation requires the main model to predict the next token one by one, with each step requiring an expensive computation. Speculative decoding first allows lighter draft modules to batch list candidates, which are then validated in parallel by the main model; Correct guesses can be received at once, and if incorrect, the main model corrects them. Therefore, acceleration comes from serial steps that reduce the main model, rather than lowering answer quality or quantifying the model into a smaller size.

DSpark uses a semi-autoregressive generation method, combining parallel backbone with lightweight sequential heads. Official production data shows that on DeepSeek-V4-Flash, the generation speed per user increases by 60% to 85% compared to the MTP-1 baseline; V4-Pro increased by 57% to 78%. These figures are official results for specific hardware, batches, and service configurations and cannot be directly translated into fixed speed-up ratios for all deployments.

More than just two checkpoints are open

DeepSpec is a complete training and evaluation codebase, including speculative decoding solutions such as DSpark, DFlash, and Eagle3, and provides data processing, training, and evaluation components under the MIT license. The accompanying V4-Flash-DSpark and V4-Pro-DSpark checkpoints are also available in DeepSeek's official Hugging Face space.

This means teams with large-scale inference needs can reproduce training methods rather than just downloading a packaged acceleration file. However, the barrier has not disappeared: V4 itself is very large, and deployment still requires high video memory, multi-card communication, and inference framework adaptation; The draft module also consumes extra VRAM, with final benefits depending on candidate acceptance rate, request concurrency, and output length.

What impact does it have on ordinary API users?

Ordinary users do not need to change prompts, nor can they rely solely on open source checkpoints to confirm that the official API has fully enabled DSpark. The truly perceptible value is lower initial output wait times and higher per-user generation speed, but whether this is reflected in pricing and rate limits still depends on the service provider. For custom teams, it's recommended to first compare your typical requests with tokens per second, P95 latency, memory usage, and answer consistency before deciding whether to switch.

Official source

DeepSeek official DeepSpec repository; DeepSeek-V4-Flash-DSpark official model page.

Recommended Tools

More