1. Abstract
Open-AutoGLM is an open-source mobile phone agent framework for Zhipu AI, and the core model is AutoGLM-Phone-9B. It understands the content of the mobile phone screen and simulates real user operations to achieve "understanding the interface, understanding the instructions, and clicking on the mobile phone". The framework is mainly for Android scenarios and is suitable for building various applications such as mobile assistants, automated operations, and testing.
2. Core Features
- Natural language driven: Supports describing tasks in Chinese natural language and automatically generating multi-step operation plans.
- Multimodal understanding of screens: Combine vision and text to recognize buttons, icons, copywriting, and layouts rather than pure coordinate scripts.
- ADB control execution: Click, swipe, input and other actions can be completed through ADB, and can be connected to the real machine or cloud phone.
- Multi-App scenarios: Designed for high-frequency applications such as WeChat, Taobao, Douyin, and Meituan, and supports cross-app task chains.
- Open source model: AutoGLM-Phone-9B is open source as a general mobile phone agent model, which is convenient for secondary training and adaptation.
3. Installation
- Environment preparation: Install Python and necessary dependency libraries, it is recommended to use a virtual environment.
- Clone the repository: git clone Open-AutoGLM and configure the project according to the README.
- Model download: Get the AutoGLM-Phone-9B weight from the official ModelScope or HuggingFace address.
- Connect the device: Turn on Android developer mode and USB debugging, and use ADB to confirm that the device is connected.
- Run examples: Execute sample scripts and test simple instructions to verify the link.
4. Typical use cases
- Smartphone assistant: automatically open apps, search for content, send messages, and share links.
- E-commerce and local life automation: search for products, compare prices, place orders, and check order progress.
- Operation and customer service tools: batch replies or process guidance in social/IM apps.
- Automated testing: Conduct UI regression testing and scene playback for multi-model and multi-version apps.
5. Ecology and competing products
- Synergy with GLM series: Relying on Zhipu's self-developed multi-modal large model system, it provides an integrated solution from base to agent.
- Compared with traditional scripting tools: Open-AutoGLM is more "an agent that understands the interface", with lower script maintenance costs and stronger generalization.
- Other mobile phone agent solutions: Its open source + privatized deployment characteristics are more conducive to the self-construction capabilities of manufacturers and enterprises.
6. Limitations and precautions
- Computing power cost: The 9B scale model still requires strong computing power for local inference and may rely on GPUs or cloud environments.
- Compatibility and maintenance: Different models, system versions, and app updates will affect the recognition effect and require continuous tuning.
- Security and compliance: When it comes to accounts, payments, and private data, strict control of permissions must be strictly controlled, and laws and the terms of use of each app must be followed.
- Anti-abuse risk: It is not suitable for scenarios such as brushing volume and malicious crawling, and it is necessary to establish clear boundaries for use within the organization.
7. Project address
https://github.com/zai-org/Open-AutoGLM
8. FAQ
: What is the Open-AutoGLM open source license? Can it be used in commercial scenarios?
Answer: The project adopts a loose open source protocol (such as Apache-2.0) and can be used for commercial development and deployment under the premise of complying with the agreement, relevant laws, and platform terms.
Question: Does the AutoGLM-Phone-9B model have to be used with Open-AutoGLM?
Answer: No, it is not necessary. AutoGLM-Phone-9B can be used as a multimodal model alone in other agent frameworks, but it can be used with Open-AutoGLM for a more complete phone automation capability.
Question: Which platform does Open-AutoGLM primarily support now?
Answer: Currently, the focus is on supporting Android devices, relying on ADB channels for control, and the iOS side requires additional capabilities or solutions.
Q: What are some best practices for deploying mobile agents in production?
Answer: It is recommended to use special devices or cloud phones, minimize permissions, separate test and official accounts, and add manual confirmation or risk control policies for key operations.