Name	Name	Last commit message	Last commit date
Latest commit History 11 Commits
src	src
.gitignore	.gitignore
.npmignore	.npmignore
README.md	README.md
demo.ts	demo.ts
output.json	output.json
package-lock.json	package-lock.json
package.json	package.json
resume1.json	resume1.json
test.pdf	test.pdf
tsconfig.json	tsconfig.json

PDF to JSON Converter

A TypeScript utility that converts PDF documents into structured JSON data while preserving text content, formatting, and hyperlinks. Perfect for resume parsing, document analysis, and content extraction workflows.

✨ Features

Text Extraction: Extract text content with precise positioning and styling
Hyperlink Detection: Capture clickable links with their coordinates and target URLs
Font Preservation: Maintains font information for each text element
Multi-page Support: Processes documents of any length
Type Safety: Built with TypeScript for better development experience
Lightweight: Minimal dependencies

📦 Installation

Prerequisites

Make sure you have the following installed on your system:

Node.js (v16 or higher)
npm (v7 or higher) or yarn

Install the package

Using npm:

npm install @shilendra-dev/pdf-to-json

Or using yarn:

yarn add @shilendra-dev/pdf-to-json

Peer Dependencies

This package requires the following peer dependencies which will be installed automatically:

pdfjs-dist: ^3.4.120 (PDF.js library for PDF parsing)
@types/node: ^18.0.0 (TypeScript types for Node.js)

🚀 Usage

import { pdfToJson } from '@shilendra-dev/pdf-to-json';
import fs from 'fs/promises';
async function convertPdfToJson() {
 try {
 // Read PDF file
 const pdfBuffer = await fs.readFile('path/to/your/document.pdf');
 // Convert to JSON
 const result = await pdfToJson(pdfBuffer, {
 outputPath: 'output.json' // Optional: Path to save the JSON output
 });
 console.log('Conversion complete!');
 console.log(`Processed ${result.numPages} pages`);
 } catch (error) {
 console.error('Error converting PDF:', error);
 }
}
convertPdfToJson();

📝 API

`pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>`

Converts a PDF document to JSON.

Parameters:

pdfSource: PDF file as Buffer or file path
options: (Optional) Configuration options
- outputPath: (string) Path to save the JSON output file
- includeTextContent: (boolean) Whether to include raw text content (default: true)
- includeStyles: (boolean) Whether to include font and style information (default: true)
- includeLinks: (boolean) Whether to include hyperlinks (default: true)

Returns: Promise that resolves to the parsed PDF data

📂 Output Format

The converter generates a JSON object with the following structure:

{
 numPages: number;
 pages: Array<{
 pageNumber: number;
 width: number;
 height: number;
 items: Array<{
 type: 'text' | 'link';
 content: string;
 x: number;
 y: number;
 width: number;
 height: number;
 fontFamily?: string;
 fontSize?: number;
 color?: string;
 url?: string; // For links
 }>;
 }>;
}

🔍 Example

import { pdfToJson } from '@shilendra-dev/pdf-to-json';
// Convert PDF from URL
const response = await fetch('https://example.com/document.pdf');
const pdfBuffer = await response.arrayBuffer();
const result = await pdfToJson(Buffer.from(pdfBuffer));
// Process the extracted data
result.pages.forEach(page => {
 console.log(`Page ${page.pageNumber} (${page.width}x${page.height}):`);
 console.log(`- Contains ${page.items.length} text items`);
});

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ by Shilendra Singh

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shilendra-dev/pdfToJSON

Folders and files

Latest commit

History

Repository files navigation

PDF to JSON Converter

✨ Features

📦 Installation

Prerequisites

Install the package

Peer Dependencies

🚀 Usage

📝 API

`pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>`

📂 Output Format

🔍 Example

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF to JSON Converter

✨ Features

📦 Installation

Prerequisites

Install the package

Peer Dependencies

🚀 Usage

📝 API

pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>

📂 Output Format

🔍 Example

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pdfToJson(pdfSource: Buffer | string, options?: PdfToJsonOptions): Promise<PdfJsonResult>`

Packages