Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

基于CTPN(tensorflow)+CRNN(pytorch)+CTC的不定长文本检测和识别

Notifications You must be signed in to change notification settings

ooooverflow/chinese-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

7 Commits

Repository files navigation

chinese-ocr

基于CTPN(tensorflow)+CRNN(pytorch)+CTC的不定长文本检测和识别

环境部署

sh setup.sh 
 
使用环境: python 3.6 + tensorflow 1.10 + pytorch 0.4.1
  • 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释

Demo

python demo.py 

下载 预训练模型

CRNN

将pytorch-crnn.pth放入/train/models中

CTPN

将checkpoints.zip解压后的内容放入/ctpn/checkpoints中

模型训练

warp-ctc安装pytorch版

详见 warp-ctc.pytorch

CTPN训练

详见 tensorflow-ctpn

CRNN训练

1.数据准备

下载训练集

  • 共约364万张图片,按照99:1划分成训练集和验证集
  • 数据利用中文语料库(新闻 + 文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成
  • 包含汉字、英文字母、数字和标点共5990个字符
  • 每个样本固定10个字符,字符随机截取自语料库中的句子
  • 图片分辨率统一为280x32

修改/train/config.py中train_data_root,validation_data_root以及image_path

2.训练

cd train 
python train.py

3.训练结果

效果展示

CTPN

OCR

参考

warp-ctc-pytorch
chinese_ocr-(tensorflow+keras)
CTPN-tensorflow
crnn-pytorch

About

基于CTPN(tensorflow)+CRNN(pytorch)+CTC的不定长文本检测和识别

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /