Back to AI information
Qwen3-VL Released: Flagship 235B Model Open Source, Instruction/Thinking Versions Available

Qwen3-VL Released: Flagship 235B Model Open Source, Instruction/Thinking Versions Available

AI information Admin 116 views

Tongyi Qianwen has launched the next-generation visual language model, the Qwen3-VL . The flagship Qwen3-VL-235B-A22B is available in two open-source versions: Instruct and Thinking . Official materials show that Instruct outperforms the Gemini 2.5 Pro on multiple visual benchmarks, while Thinking achieves leading results in multimodal reasoning tasks. The model supports "visual agents" that can interpret buttons, invoke tools, and complete real-world tasks on PC/mobile interfaces; it has performed exceptionally well in benchmarks such as OS World .

This upgrade emphasizes coverage of long context and complex scenarios: It supports over 256KB of context, expandable to 1MB , and can process approximately two hours of video and multi-page PDFs. It also offers OCR in 32 languages (with enhanced robustness against blurry, skewed, and rare characters), and provides more robust performance in 2D/3D spatial understanding, occlusion, and viewpoint reasoning. Regarding the open ecosystem, online conversation (Qwen Chat), API (Alibaba Cloud Model Studio), and Hugging Face/ModelScope weights and demos have all been released simultaneously.

Frequently Asked Questions

Q: Which variants are open sourced this time?

A: Qwen3-VL-235B-A22B Instruction and Thinking , also provides Caption/demonstration resources and reasoning examples.

Q: What can a visual agent do?

A: Read screen elements and hierarchies, understand buttons and forms, and use tool calls to complete tasks on real devices/applications.

Q: How large is the long context supported?

A: It is marked as 256K+ and can be expanded to 1M level, which is suitable for long video and long document scenarios.

Q: What is the coverage of multi-language capabilities?

A: It supports OCR in 32 languages, and its text capabilities are aligned with top general models for cross-language screen reading and comprehension.

Q: How to experience or access?

A: For Qwen Chat, choose qwen3-vl-plus . Alibaba Cloud Model Studio provides the API. Weights and demos are available in Hugging Face/ModelScope.

Qwen3-VL open source release Qwen3-VL-235B-A22B Qwen3-VLInstruct version Qwen3-VLThinking Edition Qwen3-VL Visual Agent Qwen3-VLVisualAgent Qwen3-VL long context 256K Qwen3-VL context extension 1M Qwen3-VL two-hour video comprehension Qwen3-VL Multi-page PDF Parsing Qwen3-VL multimodal reasoning Qwen3-VL surpasses Gemini2\_5Pro Qwen3-VLOSWorld evaluation leads Qwen3-VL 32 languages OCR Qwen3-VL fuzzy text recognition Qwen3-VL tilted text robustness Qwen3-VL rare character OCR Qwen3-VL2D_3D spatial understanding Qwen3-VL Occlusion Reasoning Qwen3-VL perspective reasoning Qwen3-VL screen reads buttons Qwen3-VL Form Automation Qwen3-VL tool call Qwen3-VL real device operation Qwen3-VLPC mobile phone support Qwen3-VL and QwenChat access Qwen3-VLModelStudioAPI Qwen3-VLHuggingFace Weights Qwen3-VLModelScope Mirror Qwen3-VLCaption Resources Qwen3-VL Demo Qwen3-VL multi-language screen reader Qwen3-VL complex scene coverage Qwen3-VL long document processing Qwen3-VL Video Q&A Qwen3-VL leads in multi-modal evaluation Qwen3-VL cross-language understanding Qwen3-VL open source weight download Qwen3-VL Inference Example Qwen3-VLAPI Access Guide Qwen3-VL Ecological Compatibility Qwen3-VL and tool chain collaboration Qwen3-VL Developer Friendly Qwen3-VL Enterprise Application Scenarios Qwen3-VL benchmark universal model Qwen3-VL screen element hierarchy Qwen3-VL button form understanding Qwen3-VL long video key point extraction Qwen3-VL Multi-page PDF Summary Qwen3-VL Review Highlights

Recommended Tools

More