Back to AI information
PaddleOCR-VL (0.9B) released: NaViT×ERNIE lightweight multimodal model, document parsing tops multiple benchmarks

PaddleOCR-VL (0.9B) released: NaViT×ERNIE lightweight multimodal model, document parsing tops multiple benchmarks

AI information Admin 161 views

On October 16, 2025, PaddleOCR announced the launch of its multimodal document parsing model, PaddleOCR-VL, which was released as a core capability in version 3.3.0. This model, approximately 0.9B in size, utilizes a NaViT-style dynamic resolution visual encoder combined with the ERNIE-4.5-0.3B language model to achieve unified recognition and structured output for elements such as text, tables, formulas, charts, and handwriting. Official evaluations on public and self-built datasets such as OmniDocBench show that PaddleOCR-VL achieves or surpasses state-of-the-art performance in both page-level parsing and feature-level recognition.

PaddleOCR-VL claims to cover 109 languages and scripts, including Chinese, English, Japanese, Latin and Arabic, Cyrillic, and Devanagari. It optimizes inference efficiency for real-world production and can be used in conjunction with PaddleOCR components like PP-StructureV3 and PP-OCRv5. The model and documentation are available on GitHub, HuggingFace, and the official documentation. For detailed benchmarks, visualization examples, and deployment methods, please refer to the official website. Please stay tuned for updates to the repository for further details, such as dataset versions and evaluation scope.

Frequently Asked Questions

Q: What is PaddleOCR-VL?

A: A visual language model with approximately 0.9B parameters for end-to-end document parsing that can simultaneously process text, tables, formulas, charts, and handwriting, and output structured results.

Q: Why is it called "ultra-compact"?

A: In the multimodal VLM, 0.9B is relatively small in size and efficient in inference. By combining NaViT dynamic resolution with ERNIE-4.5-0.3B, computing power requirements are reduced while maintaining accuracy.

Q: Has it really reached SOTA?

A: We have demonstrated leading results in benchmarks such as OmniDocBench v1.5/v1.0 and our own benchmarks, covering multiple indicators such as overall performance, reading order, tables, and formulas. The conclusions are based on the charts and explanations provided in public reports and model cards.

Q: What languages and application scenarios are supported?

A: It covers 109 languages and is suitable for scenarios such as multi-script typesetting, historical documents, and complex layouts. It can be linked with the layout/table structuring capabilities of PP-StructureV3 for real business analysis.

Q: Where can I get it and how can I try it?

A: GitHub provides version notes and command line/Python APIs; HuggingFace provides model cards and online demo links; the documentation site provides deployment and acceleration (such as vLLM/sglang server) guides.

PaddleOCR-VL released PaddleOCR-VL multimodal document parsing PaddleOCR-VL0_9B model PaddleOCR-VLNaViT dynamic resolution PaddleOCR-VLERNIE-4_5-0_3B PaddleOCR-VL page-level parsing SOTA PaddleOCR-VL Feature-Level Recognition SOTA PaddleOCR-VLOmniDocBench results PaddleOCR-VL109 languages PaddleOCR-VL multiple scripts support PaddleOCR-VL structured output PaddleOCR-VL Text, Table, Formula, and Chart PaddleOCR-VL handwriting recognition PaddleOCR-VL complex layout analysis PaddleOCR-VL reading order extraction PaddleOCR-VL table structuring PaddleOCR-VL formula analysis PaddleOCR-VL Graph Understanding PaddleOCR-VLPDF analysis PaddleOCR-VL Batch Processing PaddleOCR-VL production-level inference efficiency PaddleOCR-VL end-to-end parsing PaddleOCR-VL and PP-StructureV3 linkage PaddleOCR-VL and PP-OCRv5 collaboration PaddleOCR-VL is open source on GitHub PaddleOCR-VLHuggingFace model card PaddleOCR-VL Online Demo PaddleOCR-VL3_3_0 Core Capabilities PaddleOCR-VL Deployment Guide PaddleOCR-VLvLLM Server PaddleOCR-VLsglang compatible PaddleOCR-VL Lightweight VLM PaddleOCR-VL inference acceleration PaddleOCR-VL layout analysis PaddleOCR-VL Document Understanding PaddleOCR-VL Enterprise Application PaddleOCR-VLAPI Example PaddleOCR-VLPython usage PaddleOCR-VL visualization example PaddleOCR-VL model download PaddleOCR-VL Benchmark PaddleOCR-VL self-built dataset PaddleOCR-VL dataset version PaddleOCR-VL evaluation scope PaddleOCR-VL Multilingual OCR PaddleOCR-VL Historical Document Parsing PaddleOCR-VL mixed classification scenario PaddleOCR-VLSOTA comparison PaddleOCR-VL Accuracy and Efficiency PaddleOCR-VL production deployment

Recommended Tools

More