4,432 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
2
votes
1
answer
133
views
How to properly solve "invalid use of undefined type ‘struct Pixa’"?
I am including OCR with Leptonica and Tesseract capi.h in my C project. For regular images loaded as Pix, all is good, but for multipage TIFFs loaded as Pixa, I get the following compiler error:
ocr.c:...
0
votes
0
answers
23
views
Why does Tesseract detect the same text on the original and inverted image?
I want to detect all the words in my image.
In the original, it detects all the 4 lines, but not the word "Juliet". Which was expected.
Then, I inverted the image in order to detect only &...
0
votes
1
answer
153
views
Using Tesseract OCR with Docling PyTessBasseAPI call fails, won't init
I am using docling and trying to get images with scanned text to parse with Tesseract OCR (could be any OCR, but tesseract is preferred if possible).
My code is:
pipeline_options = PdfPipelineOptions()...
-1
votes
2
answers
187
views
How to find printed document in image
Image contains single document printed in white paper. Background of image can be different.
Tried to get document using code from https://scanbot.io/techblog/document-edge-detection-with-opencv/ with ...
0
votes
2
answers
77
views
Pytesseract cannot always understand very simple and clear text (font Consolas)
Pytesseract cannot understand very simple and clear text. I've tried nearest neighbor, bilinear, gaussian blur, and everything else and cannot get tesseract to read the text consistently, the best I ...
1
vote
1
answer
90
views
Can't find how to create a XML ALTO with tesseract on R
I am struggling with tesseract package (5.3.2 version) for R, trying to have a XML ALTO as output of the ocr() function. I read the documentation which states that this has something to do with the ...
-1
votes
1
answer
149
views
How to remove non uniform background from image [closed]
Receipt clip contains structured background:
Tried to remove it using textcleaner ImageMagic wrapper script from Remove receipt image border using ImageMagick answer.
Used code from answer How to use ...
2
votes
1
answer
220
views
Tesseract OCR cannot read dotted LED digits on MAUI/Xamarin
I am trying to extract numbers from dotted LED-style digits (0–9) using Tesseract OCR in a MAUI/Xamarin app on Android and iOS, fully offline. My boss wants a local solution that works on mobile ...
1
vote
3
answers
176
views
How to rewrite python code using Pyscript
My code works as a Python file, but I am struggling to make it work using PyScript. I am sharing the code that I tried.
main.py
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"...
1
vote
0
answers
201
views
How to set Tesseract PSM in Docling (Python)
I’m using Docling to OCR scanned PDFs. I want to control Tesseract’s page-segmentation mode (PSM), e.g. --psm 6.
Docling exposes both TesseractOcrOptions and TesseractCliOcrOptions, but neither ...
2
votes
1
answer
76
views
Tesseract unable to recognise the letter O in plain image
I'm attempting to perform OCR on a set of single letters inside an image using Python. I'm new to this so apologies if I get the terminology wrong, but I've filtered and have obtained (I think) quite ...
0
votes
1
answer
177
views
How to install leptonica 1.85 in Debian 12
Tried to use https://github.com/Sicos1977/TesseractOCR Nuget package in Debian 12. It looks that it requires new version of leptonica libleptonica-1.85.0.dll.so which is not avaliable in Debian:
#apt ...
0
votes
1
answer
83
views
Unable to OCR Type3 Font after image preprocessing, training Tesseract
I am trying to OCR a specific area of a PDF page in a multi-page document (total page count varies between 600-10,000 pages). I initially receive the data as .pcl files in batches of 500 records, ...
1
vote
1
answer
89
views
When I Try To Train a Tesseract Model I get a Compute CTC targets failed error
I am currently using tesseract 5.0 and am training a model. I have generated the png, box and the ground truth files for a thousand images. However, when I run the command:
make training MODEL_NAME=...
0
votes
0
answers
94
views
Lstmtraining Tesseract-OCR
I followed the steps for fine-tuning Tesseract for handwriting recognition. I have the character images and the corresponding box files. Then I generated the .lstmf files, followed by the lstm_train....