-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Some Images are not extracted from DOCX while using unstructured_client #3414
enx-github
started this conversation in
General
-
Hi all,
I have used unstructured_client to extract text, images, and tables from my DOCX file. All the text and tables are extracted properly from the document, but some Images are not extracted from the document.
I have written the code as below:
client = unstructured_client.UnstructuredClient(
api_key_auth="xXx",
server_url="https://api.unstructured.io/general/v0/general",
)
filename = "zZz"
file = open(filename, "rb")
response = client.general.partition(request=operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
# Note that this currently only supports a single file
files=shared.Files(
content=file.read(),
file_name=filename,
)
),
))
I am getting types of elements below:
- Header
- Image
- UncategorizedText
- NarrativeText
- Title
- PageBreak
- ListItem
- Table
- Footer
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments
-
The unstructured supports the image types below:
https://docs.unstructured.io/api-reference/api-services/overview
Beta Was this translation helpful? Give feedback.
All reactions
0 replies
-
Does unstructured open source have image extraction from docx?
Beta Was this translation helpful? Give feedback.
All reactions
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment