I tried to use update_object() to fix the cross-reference, but an error occurred. · pymupdf/PyMuPDF · Discussion #4702

xiaolibuzai-ovo
Sep 18, 2025

code:

 for xref in range(1, self.doc.xref_length()):
 try:
 _ = self.doc.xref_object(xref)
 except:
 try:
 self.doc.update_object(xref, "<<>>")
 except Exception as e:
 logger.error(f"save update_object error: {str(e)}, {traceback.format_exc()}")

error:

PageProcess error: RAISEPY() takes 2 positional arguments but 3 were given, Traceback (most recent call last):
 File "/usr/local/trpc/bin/process/pymupdf_process.py", line 152, in save
 _ = self.doc.xref_object(xref)
 File "/usr/local/trpc/bin/lib/pymupdf/__init__.py", line 6032, in xref_object
 ret = extra.xref_object( self.this, xref, compressed, ascii)
 File "/usr/local/trpc/bin/lib/pymupdf/extra.py", line 120, in xref_object
 return _extra.xref_object(*args)
RuntimeError: bad xref
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "/usr/local/trpc/bin/pdf_atom_process_py.py", line 126, in PageProcess
 url = await pymupdf_cli.save(file_id, request.scene, request.namespace_id)
 File "/usr/local/trpc/bin/process/pymupdf_process.py", line 154, in save
 self.doc.update_object(xref, "<<>>")
 File "/usr/local/trpc/bin/lib/pymupdf/__init__.py", line 5829, in update_object
 RAISEPY("bad xref", MSG_BAD_XREF, PyExc_ValueError)
TypeError: RAISEPY() takes 2 positional arguments but 3 were given

version: pymupdf==1.25.5
platform: linux

Replies: 4 comments 4 replies

julian-smith-artifex-com
Sep 18, 2025
Maintainer

pymupdf-1.25.5 is very old, please upgrade to the latest version.

In particular, the incorrect call of RAISEPY() with "bad xref" was fixed a while ago.

2 replies

@xiaolibuzai-ovo

xiaolibuzai-ovo Sep 18, 2025
Author

I have already upgraded to the latest version, but the issue still persists.

v1.26.4

@xiaolibuzai-ovo

xiaolibuzai-ovo Sep 18, 2025
Author

PageProcess error: bad xref, Traceback (most recent call last):
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 152, in save
_ = self.doc.xref_object(xref)
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 6095, in xref_object
ret = extra.xref_object( self.this, xref, compressed, ascii)
File "/usr/local/trpc/bin/lib/pymupdf/extra.py", line 120, in xref_object
return _extra.xref_object(*args)
RuntimeError: bad xref

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/trpc/bin/pdf_atom_process_py.py", line 125, in PageProcess
url = await pymupdf_cli.save(file_id, request.scene, request.namespace_id)
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 154, in save
self.doc.update_object(xref, "<<>>")
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 5892, in update_object
RAISEPY("bad xref", MSG_BAD_XREF)
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 17449, in RAISEPY
raise Exception( msg)
Exception: bad xref

xiaolibuzai-ovo
Sep 18, 2025
Author

file:
01995b6ca7837b52abaa24e38e8c076d.pdf

0 replies

julian-smith-artifex-com
Sep 18, 2025
Maintainer

I have created a test with your code, and with the current pymupdf-1.26.4 it runs without raising an exception, though there is a warning from mupdf repairing PDF document.

def test_4702():
 path = util.download(
 'https://github.com/user-attachments/files/22403483/01995b6ca7837b52abaa24e38e8c076d.pdf',
 'test_4702.pdf',
 )
 with pymupdf.open(path) as document:
 for xref in range(1, document.xref_length()):
 print(f'{xref=}')
 _ = document.xref_object(xref)
 wt = pymupdf.TOOLS.mupdf_warnings()
 assert wt == 'repairing PDF document'

Is your code modifying the document before it does the xref loop?

2 replies

@xiaolibuzai-ovo

xiaolibuzai-ovo Sep 18, 2025
Author

Yes, I am working on PDF layout restoration. After translating the English text, I cover it with a white rectangle and then insert the translated text using insert_html. However, an error occurs when saving the file.

@xiaolibuzai-ovo

xiaolibuzai-ovo Sep 18, 2025
Author

 Something like this.
 
 page = self.doc[page_num]
 if del_links:
 for link in page.get_links():
 page.delete_link(link)
 for i in range(len(blocks)):
 block = blocks[i]
 white = pymupdf.pdfcolor["white"]
 page.draw_rect(tuple(block.bbox), color=None, fill=white)
 
 
 for i in range(len(blocks)):
 block = blocks[i]
 try:
 
 page.insert_htmlbox(tuple(block.bbox), block.text, css=block.css, rotate=block.rotate, archive=ARCHIVE)
 except OverflowError as e:
 logger.error_context(self.ctx, f"restore err: {str(e)}")
 continue

julian-smith-artifex-com
Sep 24, 2025
Maintainer

Please post a full reproducer. For example it needs to specify page_num.

0 replies

I tried to use update_object() to fix the cross-reference, but an error occurred. #4702

Uh oh!

xiaolibuzai-ovo Sep 18, 2025

Replies: 4 comments · 4 replies

Uh oh!

julian-smith-artifex-com Sep 18, 2025 Maintainer

Uh oh!

xiaolibuzai-ovo Sep 18, 2025 Author

Uh oh!

xiaolibuzai-ovo Sep 18, 2025 Author

Uh oh!

xiaolibuzai-ovo Sep 18, 2025 Author

Uh oh!

julian-smith-artifex-com Sep 18, 2025 Maintainer

Uh oh!

xiaolibuzai-ovo Sep 18, 2025 Author

Uh oh!

xiaolibuzai-ovo Sep 18, 2025 Author

Uh oh!

julian-smith-artifex-com Sep 24, 2025 Maintainer

xiaolibuzai-ovo
Sep 18, 2025

Replies: 4 comments 4 replies

julian-smith-artifex-com
Sep 18, 2025
Maintainer

xiaolibuzai-ovo Sep 18, 2025
Author

xiaolibuzai-ovo Sep 18, 2025
Author

xiaolibuzai-ovo
Sep 18, 2025
Author

julian-smith-artifex-com
Sep 18, 2025
Maintainer

xiaolibuzai-ovo Sep 18, 2025
Author

xiaolibuzai-ovo Sep 18, 2025
Author

julian-smith-artifex-com
Sep 24, 2025
Maintainer