@yongdono yongdono Follow

@yongdono

yongdono

yongdono

27 followers · 72 following

Stars

OCR

文字识别

21 repositories

SixQuant / captcha

Recognize captcha using deep learning ResNet model and TFLearn(1000个训练数据,经过短短几分钟的训练,正确率可以达到 99%)

Jupyter Notebook 12 3 Updated Jan 17, 2019

RapidAI / RapidOCR

📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.

Python 6,820 650 Updated Jun 12, 2026

DayBreak-u / chineseocr_lite

超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M

C++ 12,315 2,279 Updated May 18, 2026

hiroi-sora / Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 45,175 4,447 Updated Nov 20, 2025

rximg / rximg

RxImg is an image processing tool based on reactive data streams, allowing image processing pipeline to be built on a low-code graphical user interfaceConfig files for my GitHub profile.

Vue 14 2 Updated Jan 22, 2023

sml2h3 / ddddocr

带带弟弟通用验证码识别OCR pypi版

Python 14,253 2,285 Updated Mar 10, 2026

SvenVincent / cnocr

CnOCR 是 Python 3 下的文字识别(Optical Character Recognition,简称OCR)工具包,支持简体中文、繁体中文(部分模型)、英文和数字的常见字符识别,支持竖排文字的识别。自带了20+个训练好的识别模型,适用于不同应用场景,安装后即可直接使用。同时,CnOCR也提供简单的训练命令供使用者训练自己的模型。

Python 56 5 Updated Apr 20, 2024

breezedeus / CnSTD

CnSTD: 基于 PyTorch/MXNet 的中文/英文场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包

Python 792 115 Updated May 1, 2026

opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 9,718 732 Updated Jan 3, 2025

opendatalab / MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python 67,428 5,679 Updated Jun 11, 2026

getomni-ai / zerox

OCR & Document Extraction using vision models

TypeScript 12,239 848 Updated May 20, 2025

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 82,091 10,751 Updated Jun 12, 2026

ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 33,872 2,340 Updated Jun 12, 2026

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Pyt...

Rust 8,485 496 Updated Jun 13, 2026

oomol-lab / pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

Python 5,760 397 Updated Jun 6, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 23,288 2,152 Updated Jan 27, 2026

WZBSocialScienceCenter / pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Python 2,255 369 Updated Jun 24, 2022

chatdoc-com / OCRFlux

OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte...

Python 2,511 151 Updated Apr 14, 2026

jingsongliujing / OnnxOCR

基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle deep learning training framework, with ultra-fast inference speed.

Python 1,811 199 Updated Jun 11, 2026

Shu-Ji / ebook-chinese-ocr

ebook of duokan ocr

Python 16 2 Updated Aug 26, 2015

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly