Cursor's new version of the Tab model is now available: online reinforcement learning makes suggestions less accurate and more acceptable

Cursor announced that the new Tab model is the default model: in actual use, the number of tab suggestions is reduced by about 20%, and the probability of suggestions being accepted is significantly increased. The core approach is to close the loop of online reinforcement learning and on-site evaluation, bringing "less but accurate" code editing suggestions into daily development.

1. Key conclusions and principles

1. Fewer prompts but more usable

The new version of Tab learns in the real coding flow, reducing the overall amount of suggestions and reducing developer distractions. At the same time, acceptance rates have increased significantly, and code completion is more contextually and intently.

2. Online reinforcement learning mechanism

Cursor adopts online reinforcement learning, using policy gradients and in-process policy data, and directly optimizes tabs with real-time feedback from developers. This method aligns the "acceptability of real-world scenarios" faster than offline fine-tuning.

2. Key points for team-oriented

implementation 1. Switch from "more to good" indicators

Shift the evaluation from the number of suggestions to the acceptance rate, fallback rate, and post-edit revisions, establish a team-level baseline, and measure the true contribution of Tab to code quality and fluency.

2. Prompt and file granularity management

Create

separate prompt templates for key directories and test files in large warehouses. Enable cross-file jumping and long-span editing for multi-file changes to reduce the cost of switching back and forth.

3. Practical path

(1) Configuration and grayscale

First, enable the new version of Tab in the core language and grayscale of key projects, then expand the coverage, and keep the old version for comparison.

(2) Observation and regression

Record suggestion acceptance rate, revocation rate and defect rate after submission by weekly regression; Establish exclusion rules for exception contexts.

(3) Collaboration and Specification

Unify code styles and test templates, allowing Tab to learn consistent editing signals and reduce "style interference".

4. Differences from competing products or old versions

(1) Rapid convergence brought about by online learning

Tab converges faster on the real coding trajectory and continuously updates the warehouse structure and team habits.

(2) "Next Action" orientation

not only supplements text, but also predicts editing and jumping paths, which is close to the actual operation link of engineers.

Frequently Asked Questions (Q&A)

Q: What are the direct benefits of the new version of Cursor Tab compared to the old version

A: Under the same amount of coding, the Tab suggestions are fewer but the hits are higher, and the average acceptance rate is significantly improved, reducing interference and invalid completion. Improve coherent editing efficiency.

Q: Why does online reinforcement learning improve Tab acceptance A

: Online reinforcement learning is used to directly optimize strategies with strategy data and immediate feedback, bringing the model closer to the "acceptable actions" of real workflows, rather than just pursuing language similarity.

Q: How should the team evaluate the effectiveness of the new version of Tab

A: Establish a two-week A and B control using acceptance rate, revocation rate, post-submission revision volume, and time spent as the main indicators; Monitor the stability of multi-file changes at the same time.

Q: Are there any special configuration suggestions for large repositories and multilingual projects

A: Set up exclusive rules and test templates for common languages and key directories; Enable cross-file editing and jumping, combined with a unified code style configuration, for more stable and accurate tabs.

Related Articles

checkpoint-engine open source: "in-place weight update" on the LLM inference side, reducing the RL training-launch cycle to the second level

HuggingChat tutorial: Multi-model dialogue, retrieval enhancement, and cost optimization

Kimi K3 officially launched: 2.8 trillion parameters betting on millions of contexts and open weight

Mistral Studio adds prompt version management: enterprise AI is now managing behavioral assets

Recommended Tools