-
Notifications
You must be signed in to change notification settings - Fork 661
Highlighted text #2284
-
Hai jorj
How to get and extract the highlighted text?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 2 replies
-
Assuming that the highlighting has happened via an annotation:
- loop over the annotations of the page (downselecting to appropriate annot type)
- take the annotation rectangle
annot.rectand enlarge a little. This is recommended, because highlighting by many software packages create a too small annot rectangle to completely cover the complete text. So add an allowance likerect = annot.rect + (-5, -5, 5, 5)- which is 5 pixels large in every direction. - Extract text like so
text = page.get_text(clip=rect).
If highlighting has not happened via annotations, things are much more complicated, so let's hope you will get away with this recipe.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Hai jorj
I have tried below code for extracting the highlighted text
Points= annot. rect
a= page.get_text(clip=points)
Expected result: need for only highlighted text
Actual result: Extracting highlighted text along with non highlighted text(pls refer image)
Pls suggest how to get only highlighted text
Beta Was this translation helpful? Give feedback.
All reactions
-
The annotation rectangle always is exactly this: a rectangle.
Your text overall however is two rectangles, each on its own line. You must use the annot.vertices property to determine those sub-areas.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 2