Alibaba Cloud announced the launch of Qwen3-VL-Flash in Model Studio, offering both "thinking mode" and "non-thinking mode" reasoning paths for image and video understanding. Official documentation indicates that the Qwen3-VL-Flash series has a context limit of approximately 260,096 tokens in non-thinking mode and 258,048 tokens in thinking mode (billed by interval), respectively, and supports visual inputs of up to 16,384 tokens per image. This series emphasizes faster response and lower call costs, making it suitable for high-load scenarios such as long videos and long documents.
In terms of capabilities, the Model Studio documentation lists video understanding, event location and timestamp extraction, as well as 2D/3D object detection, spatial relationship and occlusion detection. It also covers document parsing, formula/table recognition, and multilingual OCR, and provides an interface parameter for enabling or disabling "thinking mode" (enable_thinking). Official sources also claim that the new model offers advantages in speed, overall capabilities, and cost compared to the open-source Qwen3-VL-30B-A3B and Qwen2.5-72B. Specific comparison details and third-party retesting are yet to be disclosed.
Frequently Asked Questions
Q: What is the context limit of Qwen3-VL-Flash?
A: The document lists approximately 260,096 tokens in non-thinking mode and approximately 258,048 tokens in thinking mode, and is priced in segments of 0–32K, 32K–128K, and 128K–256K.
Q: How to switch between “thinking mode/non-thinking mode”?
A: This is controlled by the enable_thinking parameter in the API call; the thinking model will perform implicit reasoning before giving the answer, while the non-thinking model will generate it directly.
Q: What typical scenarios are supported?
A: Question answering/summary of long videos and long documents, 2D/3D object detection and spatial localization, document parsing (including tables and formulas), multilingual OCR, and vision-based agent task control.
Q: What is the relationship with the open source Qwen3-VL-30B-A3B and Qwen2.5-72B?
A: The official claim is that it is superior in speed, capability, and cost, but this is the manufacturer's statement. It is recommended to pay attention to subsequent public benchmarks and third-party evaluations.
Q: Where can I access and view prices?
A: You can view the context, segmented pricing, and sample code for qwen3-vl-flash in the Visual Understanding documentation and Model/Billing pages of Alibaba Cloud Model Studio, and obtain API instructions through the console documentation page.