-
Notifications
You must be signed in to change notification settings - Fork 650
I tried to use update_object() to fix the cross-reference, but an error occurred. #4702
-
code:
for xref in range(1, self.doc.xref_length()): try: _ = self.doc.xref_object(xref) except: try: self.doc.update_object(xref, "<<>>") except Exception as e: logger.error(f"save update_object error: {str(e)}, {traceback.format_exc()}")
error:
PageProcess error: RAISEPY() takes 2 positional arguments but 3 were given, Traceback (most recent call last):
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 152, in save
_ = self.doc.xref_object(xref)
File "/usr/local/trpc/bin/lib/pymupdf/__init__.py", line 6032, in xref_object
ret = extra.xref_object( self.this, xref, compressed, ascii)
File "/usr/local/trpc/bin/lib/pymupdf/extra.py", line 120, in xref_object
return _extra.xref_object(*args)
RuntimeError: bad xref
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/trpc/bin/pdf_atom_process_py.py", line 126, in PageProcess
url = await pymupdf_cli.save(file_id, request.scene, request.namespace_id)
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 154, in save
self.doc.update_object(xref, "<<>>")
File "/usr/local/trpc/bin/lib/pymupdf/__init__.py", line 5829, in update_object
RAISEPY("bad xref", MSG_BAD_XREF, PyExc_ValueError)
TypeError: RAISEPY() takes 2 positional arguments but 3 were given
version: pymupdf==1.25.5
platform: linux
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 4 comments 4 replies
-
pymupdf-1.25.5 is very old, please upgrade to the latest version.
In particular, the incorrect call of RAISEPY() with "bad xref" was fixed a while ago.
Beta Was this translation helpful? Give feedback.
All reactions
-
I have already upgraded to the latest version, but the issue still persists.
v1.26.4
Beta Was this translation helpful? Give feedback.
All reactions
-
PageProcess error: bad xref, Traceback (most recent call last):
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 152, in save
_ = self.doc.xref_object(xref)
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 6095, in xref_object
ret = extra.xref_object( self.this, xref, compressed, ascii)
File "/usr/local/trpc/bin/lib/pymupdf/extra.py", line 120, in xref_object
return _extra.xref_object(*args)
RuntimeError: bad xref
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/trpc/bin/pdf_atom_process_py.py", line 125, in PageProcess
url = await pymupdf_cli.save(file_id, request.scene, request.namespace_id)
File "/usr/local/trpc/bin/process/pymupdf_process.py", line 154, in save
self.doc.update_object(xref, "<<>>")
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 5892, in update_object
RAISEPY("bad xref", MSG_BAD_XREF)
File "/usr/local/trpc/bin/lib/pymupdf/init.py", line 17449, in RAISEPY
raise Exception( msg)
Exception: bad xref
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
I have created a test with your code, and with the current pymupdf-1.26.4 it runs without raising an exception, though there is a warning from mupdf repairing PDF document
.
def test_4702():
path = util.download(
'https://github.com/user-attachments/files/22403483/01995b6ca7837b52abaa24e38e8c076d.pdf',
'test_4702.pdf',
)
with pymupdf.open(path) as document:
for xref in range(1, document.xref_length()):
print(f'{xref=}')
_ = document.xref_object(xref)
wt = pymupdf.TOOLS.mupdf_warnings()
assert wt == 'repairing PDF document'
Is your code modifying the document before it does the xref loop?
Beta Was this translation helpful? Give feedback.
All reactions
-
Yes, I am working on PDF layout restoration. After translating the English text, I cover it with a white rectangle and then insert the translated text using insert_html. However, an error occurs when saving the file.
Beta Was this translation helpful? Give feedback.
All reactions
-
Something like this.
page = self.doc[page_num]
if del_links:
for link in page.get_links():
page.delete_link(link)
for i in range(len(blocks)):
block = blocks[i]
white = pymupdf.pdfcolor["white"]
page.draw_rect(tuple(block.bbox), color=None, fill=white)
for i in range(len(blocks)):
block = blocks[i]
try:
page.insert_htmlbox(tuple(block.bbox), block.text, css=block.css, rotate=block.rotate, archive=ARCHIVE)
except OverflowError as e:
logger.error_context(self.ctx, f"restore err: {str(e)}")
continue
Beta Was this translation helpful? Give feedback.
All reactions
-
Please post a full reproducer. For example it needs to specify page_num
.
Beta Was this translation helpful? Give feedback.