Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Comments

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling#208

Open
SwethaMuthuvel wants to merge 2 commits intoapache:trunk from
SwethaMuthuvel:enhance-exception-handling
Open

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling #208
SwethaMuthuvel wants to merge 2 commits intoapache:trunk from
SwethaMuthuvel:enhance-exception-handling

Conversation

@SwethaMuthuvel
Copy link

@SwethaMuthuvel SwethaMuthuvel commented Jul 4, 2025
edited
Loading

What This PR Does

This pull request improves the robustness and debuggability of PDFMergerUtility by:

  1. Adding a skipCorruptFiles flag

    • Allows users to skip unreadable or corrupt PDF files during merge.
    • Default behavior remains unchanged (i.e., throws on error).
  2. Wrapping IOException with source context

    • Converts vague errors like:
      IOException: Could not parse object stream
      
      into more useful messages like:
      IOException: Failed to load PDF from source: /path/to/file.pdf
      
    • Helps identify exactly which file failed.
  3. Applied consistently in both merge modes

    • optimizedMergeDocuments(...)
    • legacyMergeDocuments(...)
    • Added warning logs when skipping files.

Why This Helps

  • Improves debuggability — pinpoints which file caused the failure.
  • Makes batch operations resilient — avoids total failure from one bad input.
  • Scales better — suitable for bulk merging scenarios.
  • Does not break existing behavior — opt-in via setSkipCorruptFiles(true).

Swetha Muthuvel added 2 commits July 4, 2025 13:05
- Removed duplicate LOG.info calls from optimized and legacy merge methods.
- Introduced shared field 'lastMergeSkippedCount' to track skipped corrupt PDFs.
- Log merge summary once from mergeDocuments(), improving clarity and avoiding redundant output.
Copy link
Contributor

lehmi commented Jul 4, 2025

Please reformat the code first using our formatter rules to make it easier to evaluate your proposed changes

Copy link
Contributor

I'm wondering what the use case of this change would be. Wouldn't the target file be worthless if parts of the source is missing?

Is this for a school / university project, or is this part of an AI training / evaluation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /