Contact

PDFdeconstruct

Quick Links

PDFdeconstruct™ decomposes PDF files into XML files. The XML output includes:

  • text – Unicode text with font, color, and position data for each word (or each character)
  • images – in PNG, TIFF, or JPEG format
  • vector graphics – complete path information for fills and strokes
  • form fields – with field names and values

PDFdeconstruct can be used for:

  • document format conversion: convert PDF to other formats
  • document analysis: examine the content on a PDF page
  • complex content extraction: e.g., input to further processing based on text with position information

The PDFdeconstruct output format is described in the manual.

PDFdeconstruct is a cross-platform command-line tool, suitable for use on servers or for batch-mode processing.

Supported platforms:

  • Windows
  • Mac OS X
  • Linux
  • 32-bit and 64-bit versions available for all platforms
  • other platforms: portable C++ source code for the library is available

See also: For conversion to plain text (instead of XML), try our XpdfText library.

Contact Glyph & Cog for more information including evaluation copies.

Copyright 2026 Glyph & Cog, LLC

AltStyle によって変換されたページ (->オリジナル) /