Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@yongdono
yongdono
Follow

Block or report yongdono

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

OCR

文字识别
21 repositories

Recognize captcha using deep learning ResNet model and TFLearn(1000个训练数据,经过短短几分钟的训练,正确率可以达到 99%)

Jupyter Notebook 12 3 Updated Jan 17, 2019

📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.

Python 6,820 650 Updated Jun 12, 2026

超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M

C++ 12,315 2,279 Updated May 18, 2026

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 45,175 4,447 Updated Nov 20, 2025

RxImg is an image processing tool based on reactive data streams, allowing image processing pipeline to be built on a low-code graphical user interfaceConfig files for my GitHub profile.

Vue 14 2 Updated Jan 22, 2023

带带弟弟 通用验证码识别OCR pypi版

Python 14,253 2,285 Updated Mar 10, 2026
C++ 14 5 Updated Sep 9, 2024

CnOCR 是 Python 3 下的文字识别(Optical Character Recognition,简称OCR)工具包,支持简体中文、繁体中文(部分模型)、英文和数字的常见字符识别,支持竖排文字的识别。自带了20+个训练好的识别模型,适用于不同应用场景,安装后即可直接使用。同时,CnOCR也提供简单的训练命令供使用者训练自己的模型。

Python 56 5 Updated Apr 20, 2024

CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包

Python 792 115 Updated May 1, 2026

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 9,718 732 Updated Jan 3, 2025

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python 67,428 5,679 Updated Jun 11, 2026

OCR & Document Extraction using vision models

TypeScript 12,239 848 Updated May 20, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 82,091 10,751 Updated Jun 12, 2026

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 33,872 2,340 Updated Jun 12, 2026

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Pyt...

Rust 8,485 496 Updated Jun 13, 2026

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

Python 5,760 397 Updated Jun 6, 2026

Contexts Optical Compression

Python 23,288 2,152 Updated Jan 27, 2026

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Python 2,255 369 Updated Jun 24, 2022

OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte...

Python 2,511 151 Updated Apr 14, 2026

基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle deep learning training framework, with ultra-fast inference speed.

Python 1,811 199 Updated Jun 11, 2026

ebook of duokan ocr

Python 16 2 Updated Aug 26, 2015

AltStyle によって変換されたページ (->オリジナル) /