Back to AI is open source
Zhipu AI Open Source Open-AutoGLM and AutoGLM-Phone-9B: A New Starting Point for Mobile Phone Agents

Zhipu AI Open Source Open-AutoGLM and AutoGLM-Phone-9B: A New Starting Point for Mobile Phone Agents

AI is open source Admin 462 views

1. Abstract

Open-AutoGLM is an open-source mobile phone agent framework for Zhipu AI, and the core model is AutoGLM-Phone-9B. It understands the content of the mobile phone screen and simulates real user operations to achieve "understanding the interface, understanding the instructions, and clicking on the mobile phone". The framework is mainly for Android scenarios and is suitable for building various applications such as mobile assistants, automated operations, and testing.

2. Core Features

  1. Natural language driven: Supports describing tasks in Chinese natural language and automatically generating multi-step operation plans.
  2. Multimodal understanding of screens: Combine vision and text to recognize buttons, icons, copywriting, and layouts rather than pure coordinate scripts.
  3. ADB control execution: Click, swipe, input and other actions can be completed through ADB, and can be connected to the real machine or cloud phone.
  4. Multi-App scenarios: Designed for high-frequency applications such as WeChat, Taobao, Douyin, and Meituan, and supports cross-app task chains.
  5. Open source model: AutoGLM-Phone-9B is open source as a general mobile phone agent model, which is convenient for secondary training and adaptation.

3. Installation

  1. Environment preparation: Install Python and necessary dependency libraries, it is recommended to use a virtual environment.
  2. Clone the repository: git clone Open-AutoGLM and configure the project according to the README.
  3. Model download: Get the AutoGLM-Phone-9B weight from the official ModelScope or HuggingFace address.
  4. Connect the device: Turn on Android developer mode and USB debugging, and use ADB to confirm that the device is connected.
  5. Run examples: Execute sample scripts and test simple instructions to verify the link.

4. Typical use cases

  1. Smartphone assistant: automatically open apps, search for content, send messages, and share links.
  2. E-commerce and local life automation: search for products, compare prices, place orders, and check order progress.
  3. Operation and customer service tools: batch replies or process guidance in social/IM apps.
  4. Automated testing: Conduct UI regression testing and scene playback for multi-model and multi-version apps.

5. Ecology and competing products

  1. Synergy with GLM series: Relying on Zhipu's self-developed multi-modal large model system, it provides an integrated solution from base to agent.
  2. Compared with traditional scripting tools: Open-AutoGLM is more "an agent that understands the interface", with lower script maintenance costs and stronger generalization.
  3. Other mobile phone agent solutions: Its open source + privatized deployment characteristics are more conducive to the self-construction capabilities of manufacturers and enterprises.

6. Limitations and precautions

  1. Computing power cost: The 9B scale model still requires strong computing power for local inference and may rely on GPUs or cloud environments.
  2. Compatibility and maintenance: Different models, system versions, and app updates will affect the recognition effect and require continuous tuning.
  3. Security and compliance: When it comes to accounts, payments, and private data, strict control of permissions must be strictly controlled, and laws and the terms of use of each app must be followed.
  4. Anti-abuse risk: It is not suitable for scenarios such as brushing volume and malicious crawling, and it is necessary to establish clear boundaries for use within the organization.

7. Project address

 https://github.com/zai-org/Open-AutoGLM

8. FAQ

: What is the Open-AutoGLM open source license? Can it be used in commercial scenarios?

Answer: The project adopts a loose open source protocol (such as Apache-2.0) and can be used for commercial development and deployment under the premise of complying with the agreement, relevant laws, and platform terms.

Question: Does the AutoGLM-Phone-9B model have to be used with Open-AutoGLM?

Answer: No, it is not necessary. AutoGLM-Phone-9B can be used as a multimodal model alone in other agent frameworks, but it can be used with Open-AutoGLM for a more complete phone automation capability.

Question: Which platform does Open-AutoGLM primarily support now?

Answer: Currently, the focus is on supporting Android devices, relying on ADB channels for control, and the iOS side requires additional capabilities or solutions.

Q: What are some best practices for deploying mobile agents in production?

Answer: It is recommended to use special devices or cloud phones, minimize permissions, separate test and official accounts, and add manual confirmation or risk control policies for key operations.

Open-AutoGLM mobile agent framework introduction AutoGLM-Phone-9B Mobile Agent capability analysis Open-AutoGLM understands the multimodal capabilities of mobile phone screens Open-AutoGLM Chinese natural language drives mobile phone operations Build Android phone assistant based on Open-AutoGLM Open-AutoGLM supports WeChat, Taobao, Douyin, and Meituan applications Use Open-AutoGLM to automate task chains across apps Open-AutoGLM visual text combined with recognition button icons Open-AutoGLM controls real machines and cloud phones through ADB Open-AutoGLM supports click-to-swipe input and other operation commands AutoGLM-Phone-9B model open-source download and deployment guide Open-AutoGLM environment preparation and installation configuration steps Open-AutoGLM models get methods from ModelScope or HF Use ADB to connect your Android device to run Open-AutoGLM Quickly experience the Open-AutoGLM command link with sample scripts Open-AutoGLM in e-commerce and local life automation scenarios Application of Open-AutoGLM in Social IM Batch Reply Operation Use Open-AutoGLM for AppUI automated test regression Open-AutoGLM multi-model and multi-version UI playback and compatibility scheme Open-AutoGLM vs. traditional coordinate scripting automation tools Open-AutoGLM reduces script maintenance costs and improves generalization capabilities Open-AutoGLM and Zhipu GLM multimodal model system are synergistic Privatization of Open-AutoGLM Mobile Agent Practice AutoGLM-Phone-9B is used as a universal multimodal model AutoGLM-Phone-9B is a best practice for connecting to other agent frameworks Open-AutoGLM Local Inference Computing Power and GPU Resource Evaluation Recommendations for deploying AutoGLM-Phone-9B inference in the cloud Open-AutoGLM is compatible with different models and Android versions Open-AutoGLM is a tuning strategy for frequent app updates Mobile Agent involves the security and compliance points of account payment data Use of Open-AutoGLM is subject to the Platform Terms and applicable laws Prevent Open-AutoGLM from being used for abuse such as malicious crawling of brushes Establish boundaries and specifications for Open-AutoGLM usage within the organization Open-AutoGLM open source protocol Apache 2 point 0 commercial description Precautions for the implementation of Open-AutoGLM in commercial scenarios Does AutoGLM-Phone-9B have to be used with Open-AutoGLM? Open-AutoGLM currently supports the Android platform Best practices for deploying Open-AutoGLM using cloud phone clusters Operation guide for deploying Open-AutoGLM mobile agents in a production environment Open-AutoGLM test account isolation policy from official accounts Open-AutoGLM adds manual confirmation for key high-risk operations Build an enterprise-grade smartphone assistant based on Open-AutoGLM Open-AutoGLM Typical Automation Tasks and Application Scenarios List Open-AutoGLMGitHub project address and core directory description Comparison of Open-AutoGLM with other mobile agent solutions The value of Open-AutoGLM in improving the efficiency of operational customer service processes Open-AutoGLM in e-commerce search price comparison case case Open-AutoGLM is a practical application of content search and sharing links Open-AutoGLM design implementation idea for cross-app task links Open-AutoGLM future function expansion direction and community ecology

Recommended Tools

More