OpenAI's internal practice Codex: From understanding large warehouses to engineering paradigms for batch refactoring

OpenAI has released "How OpenAI Uses Codex," systematically summarizing its daily usage across security, front-end, API, infrastructure, and performance engineering teams. This includes code comprehension, cross-repository refactoring, performance optimization, unit test completion, and development acceleration. It also incorporates best practices like the Ask/Code dual-mode, Agents.md, and Best-of-N, helping engineering teams truly transform AI programming into productivity. I. Seven Common Uses 1. Code Comprehension and Incident Response Codex can quickly locate core logic, service relationships, and data flows in unfamiliar modules, assisting with troubleshooting and on-the-job response, replacing inefficient searches and shortening the time it takes to "get the big picture." 2. Refactoring and Migration Codex centrally generates changes and pull requests for interface upgrades, schema replacements, and dependency migrations across multiple files, avoiding missed regular expression replacements and improving consistency and rollback.

（1）Performance Optimization

Identify high-frequency slow paths and repeated expensive calls, and recommend batching and caching strategies to reduce memory and latency.

（2）Test Coverage

Automatically complete boundary cases and property tests, prioritize filling low-coverage modules, and build a more stable regression network.

（3）Development Speedup

Generate scaffolding, close small tasks, implement telemetry and configuration, and reduce the long-tail costs of "start-finish".

a. Maintain Flow

Submit scattered ideas and unfinished tasks to the task queue and merge and review them when there is a free time.

b. Exploration and Conception

Compare multiple implementation paths, verify design trade-offs, and scan for similar defects and legacy patterns.

II. Best Practices: Turn Codex into a Reliable Colleague

1. Ask → Code Dual Mode

First use Ask Mode to generate an implementation plan, then switch to Code Mode to execute, reducing the risk of deviation during large-scale modifications.

2. Engineering Environment as Data

Configure startup scripts, environment variables, and network permissions for Codex, and correct build errors with iterations to significantly reduce errors in the long term.

(1) Write prompts like you would write an issue

Include file paths, component names, difference snippets, and document snippets, and reference "Implemented according to module X's approach."

(2) Use the task queue as a lightweight to-do

Deliver small tasks during fragmented time, and merge them when you're back to focus.

(3) AGENTS.md provides persistent context

It precipitates naming conventions, business rules, and known pitfalls, making up for implicit knowledge outside the code.

a. Best-of-N

Parallel multiple solutions, select the best or splice them together, and significantly improve the quality of complex tasks.

b. Task granularity

Optimize the task size to "about one hour/hundreds of lines" and gradually increase it.

c. Quality gate

Automatically check and accept tasks with unit testing, linting, and regression scripts.

Frequently Asked Questions (Q&A)

Q: How does the OpenAI team use Codex to accelerate code understanding and troubleshooting? A: Codex generates summaries of system relationships and data flows, locating fault propagation paths and key files, replacing manual full-repository searches and improving on-the-job response speed and accuracy. Q: What are the advantages of Codex for cross-repository refactoring and migration? A: It understands structure and dependencies, allowing you to replace old schemas in batches, generate impact point summaries, and open pull requests (PRs). This reduces omissions and style inconsistencies, facilitating review and rollback. Q: How can Codex be used to improve test coverage and performance? A: Codex automatically completes edge cases for low-coverage modules; it provides caching/batching recommendations for high-cost paths, explaining the benefits, and serves as a basis for performance review. Q: Which best practices have the greatest impact on performance? A: The Ask→Code dual mode, persistent context in AGENTS.md, and the Best-of-N parallel approach are the most critical. Combined with "writing prompts like writing issues," stability and reproducibility are significantly improved.

Official original text:

https://cdn.openai.com/pdf/6a2631dc-783e-479b-b1a4-af0cfbd38630/how-openai-uses-codex.pdf

Related Articles

GPT-5 Thinking now features adjustable thinking time: Plus, Pro, and Business users can switch speed and depth on the ChatGPT web app.

Doubao AI Tool Encyclopedia: An integrated platform for dialogue, vision, and video generation

Kimi K3 officially launched: 2.8 trillion parameters betting on millions of contexts and open weight

Mistral Studio adds prompt version management: enterprise AI is now managing behavioral assets

Recommended Tools