Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 930c345

Browse files
Update Readme FIle with detail.
1 parent ce16f6f commit 930c345

File tree

1 file changed

+15
-6
lines changed

1 file changed

+15
-6
lines changed

‎README.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22
### Description
33
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.
44

5-
Special Thanks for the [Lamhoangtung](https://github.com/lamhoangtung/LineHTR) for the great contribution.
6-
5+
Special Thanks for the [Line HTR](https://github.com/lamhoangtung/LineHTR) and [@Harald Scheidl](https://github.com/githubharald) for their work.
76
### Why Deep Learning?
87
![Why Deep Learning](images/WhyDeepLearning.png?raw=true "Why Deep Learning")
98
> Deep Learning self extracts features with a deep neural networks and classify itself. Compare to traditional Algorithms it performance increase with Amount of Data.
@@ -12,7 +11,7 @@ Special Thanks for the [Lamhoangtung](https://github.com/lamhoangtung/LineHTR) f
1211
![Step_wise_detail](images/Step_wise_detail_of_workflow.png?raw=true "Step_Wise Detail")
1312
* First Use Convolutional Recurrent Neural Network to extract the important features from the handwritten line text Image.
1413
* The output before CNN FC layer (512x100x8) is passed to the BLSTM which is for sequence dependency and time-sequence operations.
15-
* Then CTC LOSS [Alex Graves](https://www.cs.toronto.edu/~graves/icml_2006.pdf) is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as -log("gtText"); aim to minimize negative maximum likelihood path.See [this](https://distill.pub/2017/ctc/) for detail.
14+
* Then CTC LOSS [Alex Graves](https://www.cs.toronto.edu/~graves/icml_2006.pdf) is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as `-log("gtText")`; aim to minimize negative maximum likelihood path.
1615
* Finally CTC finds out the possible paths from the given labels. Loss is given by for (X,Y) pair is: ![Ctc_Loss](images/CtcLossFormula.png?raw=true "CTC loss for the (X,Y) pair")
1716
* Finally CTC Decode is used to decode the output during Prediction.
1817
</i>
@@ -21,6 +20,12 @@ Special Thanks for the [Lamhoangtung](https://github.com/lamhoangtung/LineHTR) f
2120
#### Detail Project Workflow
2221
![Architecture of Model](images/ArchitectureDetails.png?raw=true "Model Architecture")
2322

23+
* Project consists of Three steps:
24+
1. Multi-scale feature Extraction --> Convolutional Neural Network 7 Layers
25+
2. Sequence Labeling (BLSTM-CTC) --> Recurrent Neural Network (2 layers of LSTM) with CTC
26+
3. Transcription --> Decoding the output of the RNN (CTC decode)
27+
![DetailModelArchitecture](images/DetailModelArchitecture.png?raw=true "DetailModelArchitecture")
28+
2429
# Requirements
2530
1. Tensorflow 1.8.0
2631
2. Flask
@@ -33,7 +38,8 @@ Special Thanks for the [Lamhoangtung](https://github.com/lamhoangtung/LineHTR) f
3338
* Only needed the lines images and lines.txt (ASCII).
3439
* Place the downloaded files inside data directory
3540

36-
###### You can find trained model to download from [here.](https://drive.google.com/open?id=10HHNZPqPQZCQCLrKGQOq5E7zFW5wGcA4) Download and extract all files inside the `model/` directory.
41+
##### The Validation character error rate obtain : 8.654728% i.e around 92 % accuracy
42+
3743

3844

3945
To Train the model from scratch
@@ -66,9 +72,12 @@ With Correction clothed leaf by leaf with the dioappoistmest
6672
**Prediction output on Self Test Data**
6773
![PredictionOutput](images/PredictionOutput.png?raw=true "Prediction Output on Self Data")
6874

69-
**The Validation character error rate of saved model: 8.654728%**
75+
76+
###### You can find trained model to download from [here.](https://drive.google.com/open?id=10HHNZPqPQZCQCLrKGQOq5E7zFW5wGcA4) Download and extract all files inside the `model/` directory.
77+
7078
# Further Improvement
7179
* Line segementation can be added for full paragraph text recognition
7280
* Better Image preprocessing to handle real time image.
73-
* Better Decoding approach to improve accuracy.
81+
* Better Decoding approach to improve accuracy. Some of the CTC Decoder found [here](https://github.com/githubharald/CTCDecoder)
7482
* More variety of data for real time recognition.
83+
* Data Augmentation essential to improve accuracy.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /