Comments

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling#208

Open

SwethaMuthuvel wants to merge 2 commits intoapache:trunk from

SwethaMuthuvel:enhance-exception-handling

Open

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208
SwethaMuthuvel wants to merge 2 commits intoapache:trunk from
SwethaMuthuvel:enhance-exception-handling

Conversation

@SwethaMuthuvel

Copy link

@SwethaMuthuvel SwethaMuthuvel commented Jul 4, 2025 •

edited

Loading

What This PR Does

This pull request improves the robustness and debuggability of PDFMergerUtility by:

Adding a skipCorruptFiles flag
- Allows users to skip unreadable or corrupt PDF files during merge.
- Default behavior remains unchanged (i.e., throws on error).
Wrapping IOException with source context
- Converts vague errors like:
```
IOException: Could not parse object stream
```
  into more useful messages like:
```
IOException: Failed to load PDF from source: /path/to/file.pdf
```
- Helps identify exactly which file failed.
Applied consistently in both merge modes
- optimizedMergeDocuments(...)
- legacyMergeDocuments(...)
- Added warning logs when skipping files.

Why This Helps

Improves debuggability — pinpoints which file caused the failure.
Makes batch operations resilient — avoids total failure from one bad input.
Scales better — suitable for bulk merging scenarios.
Does not break existing behavior — opt-in via setSkipCorruptFiles(true).

Swetha Muthuvel added 2 commits

July 4, 2025 13:05


 Add option to skip corrupt PDFs in PDFMergerUtility with improved exc...

37a40f8

...eption handling.


 PDFBOX-XXXX: Centralize merge summary logging in PDFMergerUtility

ae17c4d

- Removed duplicate LOG.info calls from optimized and legacy merge methods.
- Introduced shared field 'lastMergeSkippedCount' to track skipped corrupt PDFs.
- Log merge summary once from mergeDocuments(), improving clarity and avoiding redundant output.

@lehmi

Copy link

Contributor

lehmi commented Jul 4, 2025

Please reformat the code first using our formatter rules to make it easier to evaluate your proposed changes

@THausherr

Copy link

Contributor

THausherr commented Jul 5, 2025

I'm wondering what the use case of this change would be. Wouldn't the target file be worthless if parts of the source is missing?

Is this for a school / university project, or is this part of an AI training / evaluation?

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling#208

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208
SwethaMuthuvel wants to merge 2 commits intoapache:trunk from
SwethaMuthuvel:enhance-exception-handling

Conversation

@SwethaMuthuvel SwethaMuthuvel commented Jul 4, 2025 •

edited

Loading

Uh oh!

What This PR Does

Why This Helps

Uh oh!

lehmi commented Jul 4, 2025

Uh oh!

THausherr commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

@SwethaMuthuvel SwethaMuthuvel commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What This PR Does

Why This Helps

Uh oh!

lehmi commented Jul 4, 2025

Uh oh!

THausherr commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@SwethaMuthuvel SwethaMuthuvel commented Jul 4, 2025 •

edited

Loading