Back to AI information
ursor updates the Codex agent framework to support GPT-5.1-Codex-Max and the new version of the test system

ursor updates the Codex agent framework to support GPT-5.1-Codex-Max and the new version of the test system

AI information Admin 139 views

Cursor has published a blog post about its agent framework upgrade for OpenAI's latest coding model, GPT-5.1-Codex-Max. The team has built a more robust agent testing system around the internal evaluation suite Cursor Bench, optimizing the performance of Codex in the Cursor environment from multiple dimensions such as success rate, tool call ability, and real usage data, in order to give full play to this model variant for intelligent asana coding training.

In terms of specific changes, Cursor makes tool naming and semantics closer to shell commands, encourages models to call built-in tools first rather than directly issuing shell commands, and relies on sandboxing mechanisms to control file and network access risks. For Codex-specific "inference summaries", the team set length and frequency specifications, removing prompts for conversations with users in the middle to improve the quality of the final code. At the same time, it enhances the handling of linter errors, and guides the model to detect and automatically fix issues using read_lints tools after important changes through explicit instructions.

Cursor also emphasizes the need to preserve the Codex's internal inference trajectory between multiple tool calls to maintain planned continuity in long-link tasks and trigger alarms when missing trajectories to prevent significant performance degradation. In terms of interaction policy, the model is encouraged by default to take direct action to write code or call tools when the user does not explicitly request "only give the solution", and reorganize the order of system and user messages to avoid conflicts between prompts such as "save tokens" and the actual task goal, affecting the agent's willingness to execute.

FAQ

Q: What is the core of this Cursor update for Codex?

A: The main task is to build a more robust agent testing and running framework for GPT-5.1-Codex-Max, including tuning tool configuration, prompts, inference trajectories, and message order.

Q: Why should the tool name be closer to the shell?

A: Because Codex relies heavily on shell workflows for training, this helps the model use the Cursor tool more naturally, rather than falling back into blunt shell commands or inline scripts.

Q: What is the impact of retaining "inference tracks" on users?

A: It allows the model to maintain a clear medium- and long-term plan during multiple tool calls, reducing forgetting sub-goals and repeated derivation, and improving the success rate of complex repair tasks.

Q: How does Cursor guide Codex to automatically fix the Lint error?

A: Call the read_lints tool to check the recently modified files after completing substantive edits by clearly prompting them, and let the agent correct them when they can easily judge the fix plan.

Q: What does this upgrade mean for regular Cursor users?

A: Users can expect more proactive code modifications, fewer invalid interactions, and more stable results in large refactoring and multi-step fix scenarios when using the Codex model.

Cursor adapts to GPT5 points 1 CodexMax proxy CursorBench optimizes the agent test system GPT5 points 1CodexMax performance tuning in Cursor Upgrade the ability to call Cursor proxy framework tools Cursor makes tool naming close to shell commands Codex prioritizes calling built-in tools over shells The sandbox mechanism restricts network access to Codex files Codex inference summary length frequency specification design Remove mid-conversation prompts to improve code quality Cursor enhances the automatic fixing of linter errors Guide Codex to use read_lints to detect issues Codex maintains the inference trajectory with multiple tool calls Missing internal inference traces trigger performance alarms Planned continuity in long-link encoding tasks Cursor encourages Codex to write code directly by default The agent actively invokes the tool when the user does not request it Refactor the system and user message order policies Avoid saving tokens prompts that interfere with task objectives GPT5 points 1CodexMax is oriented towards intelligent asana coding CursorBench multi-dimensional evaluation success rate and tools Fine-tune Codex behavior based on real-world usage data The Cursor agent supports multi-step repairs for large refactoring Strengthen the closed-loop of linter feedback to improve code quality Codex enables end-to-end changes in the IDE environment The tool semantics are close to the shell to reduce model confusion Cursor guards security and compliance with sandboxes Inference Summary focuses on internal planning for non-user interactions Codex agents are more robust for complex remediation tasks CursorBench systematically evaluates the performance of agents It clearly indicates that you must read lints after completing the modification Codex automatically locates and fixes Lint errors Avoid invalid chat and improve the interaction efficiency Reduce sub-target forgetting in multi-round tool calls Cursor optimizes prompt templates to adapt to Codex features Codex has a better understanding of project structure in Cursor The inference trajectory is monitored and the inference trajectory is complete through the alarm mechanism The Cursor agent framework reduces the need for manual intervention GPT5 points 1CodexMax is deeply bound to Cursor Cursor agent upgrade for team collaboration The agent automatically executes instead of just giving the scenario pattern Avoid abandoning key debugging steps for saving tokens Cursor focuses on improving the success rate of real-world tasks Codex is optimized for multi-file large codebases Cursor makes the Agent tool ecosystem more consistent and easy to use Unified tool naming conventions facilitate model migration CursorBench covers multilingual and multi-frame scenarios CodexMax executes commands securely within the sandbox Cursor pushes intelligent asana coding into the mainstream Fine-grained control inference summary reduces token waste Developers use Cursor to get fewer invalid interactions

Recommended Tools

More