Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

no pages in doc after clean up #4695

Unanswered
hgdhot asked this question in Looking for help
Discussion options

A pdf has 60+ pages, but after clean up with the code below, page_count is 0

def _clean_up(file_path):
doc = fitz.open(file_path)

try:
 tmp = BytesIO()
 tmp.write(doc.write(garbage=3, deflate=True))
 
 doc = fitz.Document('pdf', tmp.getvalue())
 tmp.close()
except Exception:
 ...
return doc

I guess there is something wrong with the pdf, how can I identify?

You must be logged in to vote

Replies: 1 comment

Comment options

You actually executed some type of test in that you saved the PDF (to memory).
You could have looked into string pymupdf.TOOLS.mupdf_warnings() right after your first open - MuPDF will try automatic repairs if it encounters issues.
In your case, probably the PDF's internal structure is damaged that is called "page tree" - container of pointers to the single page objects.

I am slightly confused by the way you do that save to memory - I would have simply done pdfstream = doc.tobytes(...), and then tmp = pymupdf.open(stream=pdfstream).

You might as well loop through the single pages after original open to confirm that there is a page tree problem:

for page in doc:
 print(f"Successfully accessed {page.number=}")
You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /