What is new in PaddleOCR 3.5?

PaddleOCR 3.5 includes a browser inference SDK called PaddleOCR.js, one-click conversion of Word, Excel, and PPT documents into Markdown, a Transformers backend for access to 20 primary models via Hugging Face, and the ability to export parsing results in DOCX format.

What is PaddleOCR.js?

PaddleOCR.js is a browser inference SDK included in PaddleOCR 3.5, which enables direct execution of PP-OCRv5 in browsers with support for WebGPU and Wasm acceleration.

What does the one-click conversion feature do?

The one-click conversion feature simplifies the process of converting complex document formats like Word, Excel, and PPT into a more structured and machine-readable format (Markdown).

What is the Transformers backend in PaddleOCR 3.5?

The Transformers backend in PaddleOCR 3.5 allows access to 20 primary models via Hugging Face, providing developers with flexibility in choosing the inference engine that best fits their needs.

Baidu's PaddleOCR 3.5: New Features Bring OCR Tool Closer to Hugging Face Ecosystem

Q: What industries might benefit from PaddleOCR 3.5?

PaddleOCR 3.5 is particularly relevant for applications that require document parsing and analysis, such as RAG, Document AI, search, analytics, or agent applications.

PaddleOCR 3.5 introduces a browser inference SDK, one-click document conversion, Transformers backend, and DOCX export. This integration with Hugging Face enhances document parsing and analysis capabilities.

PaddleOCR 3.5 has been released by Baidu's PaddlePaddle open-source OCR tool, introducing four major new features that bring its capabilities closer to the Hugging Face ecosystem.

What Happened

The latest version of PaddleOCR includes a browser inference SDK called PaddleOCR.js, which enables direct execution of PP-OCRv5 in browsers with support for WebGPU and Wasm acceleration. This means that data never leaves the browser, ensuring better security and compliance.

Another significant feature is one-click conversion of Word, Excel, and PPT documents into Markdown. This simplifies the process of converting complex document formats into a more structured and machine-readable format.

PaddleOCR 3.5 also introduces a Transformers backend, allowing access to 20 primary models via Hugging Face. This provides developers with flexibility in choosing the inference engine that best fits their needs. Additionally, parsing results from the PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation can now be exported in DOCX format.

Background and Context

PaddleOCR is a powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and Large Language Models (LLMs). It supports over 100 languages and has gained significant traction in the industry, with top-tier projects like Dify, RAGFlow, and Cherry Studio relying on it.

The PaddlePaddle team has been actively contributing to the development of PaddleOCR, with a focus on improving its accuracy, efficiency, and flexibility. The latest release is a testament to their efforts in making PaddleOCR an even more robust and versatile tool for developers.

Why it Matters

The integration of PaddleOCR 3.5 with the Hugging Face ecosystem has significant implications for the industry. With this release, developers can now leverage PaddleOCR's OCR capabilities in conjunction with Transformers-based models, creating a more seamless and efficient workflow.

This is particularly relevant for applications that require document parsing and analysis, such as RAG, Document AI, search, analytics, or agent applications. By providing a more natural path from documents to downstream workflows, PaddleOCR 3.5 enables developers to build more sophisticated and accurate models.

What Comes Next

The release of PaddleOCR 3.5 marks an important milestone in the development of PaddlePaddle's OCR toolkit. As the industry continues to evolve, it will be interesting to see how PaddleOCR adapts to emerging trends and technologies.

With its focus on flexibility, efficiency, and accuracy, PaddleOCR is well-positioned to remain a leading player in the OCR market. As developers continue to explore new applications for PaddleOCR 3.5, it will be exciting to see the innovative solutions that emerge from this powerful tool.

Key Facts

PaddleOCR 3.5 introduces a browser inference SDK called PaddleOCR.js, enabling direct execution of PP-OCRv5 in browsers with support for WebGPU and Wasm acceleration.
The latest version includes one-click conversion of Word, Excel, and PPT documents into Markdown.
PaddleOCR 3.5 integrates a Transformers backend, allowing access to 20 primary models via Hugging Face.
Developers can now export parsing results from the PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation in DOCX format.
PaddleOCR supports over 100 languages and has gained significant traction in the industry, with top-tier projects like Dify, RAGFlow, and Cherry Studio relying on it.