Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Highlighted text #2284

Mano2731995 started this conversation in Ideas
Mar 13, 2023 · 1 comments · 2 replies
Discussion options

Hai jorj

How to get and extract the highlighted text?

You must be logged in to vote

Replies: 1 comment 2 replies

Comment options

Assuming that the highlighting has happened via an annotation:

  • loop over the annotations of the page (downselecting to appropriate annot type)
  • take the annotation rectangle annot.rect and enlarge a little. This is recommended, because highlighting by many software packages create a too small annot rectangle to completely cover the complete text. So add an allowance like rect = annot.rect + (-5, -5, 5, 5) - which is 5 pixels large in every direction.
  • Extract text like so text = page.get_text(clip=rect).

If highlighting has not happened via annotations, things are much more complicated, so let's hope you will get away with this recipe.

You must be logged in to vote
2 replies
Comment options

Hai jorj

I have tried below code for extracting the highlighted text

Points= annot. rect
a= page.get_text(clip=points)

Expected result: need for only highlighted text

Actual result: Extracting highlighted text along with non highlighted text(pls refer image)

Input.jpg

Output.jpg

Pls suggest how to get only highlighted text

Comment options

The annotation rectangle always is exactly this: a rectangle.
Your text overall however is two rectangles, each on its own line. You must use the annot.vertices property to determine those sub-areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /