no pages in doc after clean up · pymupdf/PyMuPDF · Discussion #4695

hgdhot
Sep 11, 2025

A pdf has 60+ pages, but after clean up with the code below, page_count is 0

def _clean_up(file_path):
doc = fitz.open(file_path)

try:
 tmp = BytesIO()
 tmp.write(doc.write(garbage=3, deflate=True))
 
 doc = fitz.Document('pdf', tmp.getvalue())
 tmp.close()
except Exception:
 ...
return doc

I guess there is something wrong with the pdf, how can I identify?

Replies: 1 comment

JorjMcKie
Sep 11, 2025
Maintainer

You actually executed some type of test in that you saved the PDF (to memory).
You could have looked into string pymupdf.TOOLS.mupdf_warnings() right after your first open - MuPDF will try automatic repairs if it encounters issues.
In your case, probably the PDF's internal structure is damaged that is called "page tree" - container of pointers to the single page objects.

I am slightly confused by the way you do that save to memory - I would have simply done pdfstream = doc.tobytes(...), and then tmp = pymupdf.open(stream=pdfstream).

You might as well loop through the single pages after original open to confirm that there is a page tree problem:

for page in doc:
 print(f"Successfully accessed {page.number=}")

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

no pages in doc after clean up #4695

Uh oh!

{{title}}

Uh oh!

hgdhot
Sep 11, 2025

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

JorjMcKie
Sep 11, 2025
Maintainer

Select a reply

Uh oh!

no pages in doc after clean up #4695

Uh oh!

hgdhot Sep 11, 2025

Replies: 1 comment

Uh oh!

JorjMcKie Sep 11, 2025 Maintainer

hgdhot
Sep 11, 2025

JorjMcKie
Sep 11, 2025
Maintainer