Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add hOCR output format #1275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sliedes wants to merge 1 commit into JaidedAI:master
base: master
Choose a base branch
Loading
from sliedes:hocr
Open

Add hOCR output format #1275

sliedes wants to merge 1 commit into JaidedAI:master from sliedes:hocr

Conversation

Copy link

@sliedes sliedes commented Jul 2, 2024

This change adds rudimentary hOCR output support. Notes:

  • Currently it just adds bounding boxes, not baselines (which are also supported) to the hOCR output

  • It doesn't add any semantic layout stuff; instead, it just represents each word as an ocrx_word

  • Some of the metadata could be improved, such as adding the real image name and perhaps EasyOCR version number

  • I didn't check if EasyOCR supports multipage inputs; this will certainly break with those if it does

  • I left this comment in the source code; I'm not sure what to do with it (probably shouldn't be enabled by default):

# In order to get a browser-renderable HTML file, you can add this before the closing </body> tag:
#
# <script src="https://unpkg.com/hocrjs"></script>

Other than that, I validated the output with hocr-check from https://github.com/ocropus/hocr-tools and also checked that it validates as XHTML.

verglor reacted with thumbs up emoji StudioEtrange reacted with hooray emoji
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /