-
Notifications
You must be signed in to change notification settings - Fork 651
no pages in doc after clean up #4695
-
A pdf has 60+ pages, but after clean up with the code below, page_count is 0
def _clean_up(file_path):
doc = fitz.open(file_path)
try:
tmp = BytesIO()
tmp.write(doc.write(garbage=3, deflate=True))
doc = fitz.Document('pdf', tmp.getvalue())
tmp.close()
except Exception:
...
return doc
I guess there is something wrong with the pdf, how can I identify?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
You actually executed some type of test in that you saved the PDF (to memory).
You could have looked into string pymupdf.TOOLS.mupdf_warnings()
right after your first open - MuPDF will try automatic repairs if it encounters issues.
In your case, probably the PDF's internal structure is damaged that is called "page tree" - container of pointers to the single page objects.
I am slightly confused by the way you do that save to memory - I would have simply done pdfstream = doc.tobytes(...)
, and then tmp = pymupdf.open(stream=pdfstream)
.
You might as well loop through the single pages after original open to confirm that there is a page tree problem:
for page in doc: print(f"Successfully accessed {page.number=}")
Beta Was this translation helpful? Give feedback.