GLM-5.1 released: Z.ai bet on an open source code model with 8 hours of long-term proxy tasks

GLM-5.1 is officially released, and Z.ai defines it as a new generation of open source flagships for agentic engineering. According to official information, this model focuses on code, tool calling and long-term autonomous execution, giving a set of leading results on tasks such as SWE-Bench Pro, NL2Repo and Terminal-Bench 2.0, while stretching the continuous autonomous working time of a single task to 8 hours.

The open source code model shifts to engineering tasks

From the perspective of product positioning, Z.ai did not focus on general chat this time, but explicitly pushed GLM-5.1 to code brokering and engineering tasks. The official selling points focus on warehouse generation, terminal operation and real software repair, which also shows that the competition for open source models is shifting from "whether to write code" to "whether it can complete real engineering delivery".

Focusing on the results of SWE-Bench Pro, NL2Repo and Terminal-Bench 2.0, GLM-5.1 wants to compete not for a single point score advantage, but to establish "executable" product awareness in the code model market. For developers, whether the model can handle complex warehouses and complete multi-step operations is more important than whether the answer is fluent in a single round.

8 hours of long-term execution is the core selling point

What is more worth looking at than the list is the ability to do long-term tasks. Z.ai clearly emphasizes that GLM-5.1 can run autonomously for 8 hours continuously on a single task, and continuously adjust its strategy along the way, completing hundreds of iterations and thousands of tool calls. This capability is not directed at simple Q&A, but at a continuous workflow closer to real software engineering.

In the past, the industry paid more attention to whether the model was smart enough to output a single round, but now it pays more and more attention to whether it can continue to advance tasks with complex goals. Whether planning, execution, testing, and repair can be connected into a closed loop determines whether the code agent can truly enter the development process, which is also the difference that GLM-5.1 tries to amplify.

Weights, APIs and productization are promoted simultaneously

This release isn't just about releasing a set of benchmarks. GLM-5.1 has synchronized open weights, also provides API access, and plans to go live in chat.z.ai in the coming days. For Z.ai, this approach of rolling out open source, development interfaces, and product on-ramps at the same time is clearly a push for faster developer adoption.

From the perspective of industry competition, the key to open source models is no longer just "whether they are open", but whether they can enter real usage scenarios faster. GLM-5.1 emphasizes code capabilities, long-term autonomous execution, and multi-entry delivery, indicating that Z.ai wants to target not the popularity of general models, but the more specific AI programming market.

The most interesting thing about GLM-5.1 this time is not that it has another open source model, but that Z.ai is pushing the open source code model to a "long-term autonomous agent". What really determines its position next will not be just a round of list results, but whether developers are willing to give it more complete engineering tasks.

The open source code model shifts to engineering tasks

8 hours of long-term execution is the core selling point

Weights, APIs and productization are promoted simultaneously

Related Articles

Does Discord Nitro Unblock Midjourney? Why this idea doesn't work

Anthropic released Claude Mythos Preview, and the AI security model began to impact zero-day vulnerability attacks and defenses

Kimi K3 officially launched: 2.8 trillion parameters betting on millions of contexts and open weight

Mistral Studio adds prompt version management: enterprise AI is now managing behavioral assets

Recommended Tools

GLM-5.1 released: Z.ai bet on an open source code model with 8 hours of long-term proxy tasks

The open source code model shifts to engineering tasks

8 hours of long-term execution is the core selling point

Weights, APIs and productization are promoted simultaneously

Related Articles

Does Discord Nitro Unblock Midjourney? Why this idea doesn't work

Anthropic released Claude Mythos Preview, and the AI security model began to impact zero-day vulnerability attacks and defenses

Kimi K3 officially launched: 2.8 trillion parameters betting on millions of contexts and open weight

Mistral Studio adds prompt version management: enterprise AI is now managing behavioral assets

Recommended Tools

Submit AI Tool

Please confirm submission information