Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Notifications You must be signed in to change notification settings

tuffstuff9/nextjs-pdf-parser

Repository files navigation

Next.js PDF Parser Template πŸ“„πŸ”

nextjs-pdf-parser.mp4

Introduction

I was having some trouble parsing PDFs in Next.js, so I thought I would make this template for anyone else who was facing the same issues as me. I hope this template saves you some time and trouble. It's a basic create-next-app with PDF parsing implemented using the pdf2json library and file uploading facilitated by FilePond.

Installation & Setup πŸš€

  1. Clone the repository:

  2. git clone [repository-url]

  3. Navigate to the project directory:

  4. cd nextjs-pdf-parser

  5. Install dependencies:

  6. Windows only: In app\api\upload\route.ts on line 22, change tempFilePath to a valid path. Make sure it starts from the root drive, for example: C:/coding/nextjs-pdf-parser/public/${fileName}.pdf

  7. npm install
    # or
    yarn install
  8. Run the development server:

    npm run dev
    # or
    yarn dev

    Visit http://localhost:3000 to view the application.

Usage πŸ–±

Navigate to http://localhost:3000 and use the FilePond uploader to select and upload a PDF. Once uploaded, the content of the PDF is parsed and printed to the server console (Note: it will not be printed to the browser log).

Technical Details πŸ› 

  • nodeUtil is not defined Error:

    To bypass the nodeUtil is not defined error, the following configuration was added to next.config.js:

const nextConfig = {
 experimental: {
 serverComponentsExternalPackages: ['pdf2json'],
 },
};
module.exports = nextConfig;

See more details here

  • Blank output from pdfParser.getRawTextContent():

    This issue might be due to incorrect type definitions. There are two potential solutions:

    1. Fix TypeScript definitions: Update the type definition for PDFParser.

    2. Bypass type checking: Instantiate PDFParser as shown:

      const pdfParser = new (PDFParser as any)(null, 1);

    For more details, refer to my comment on this GitHub issue.

Acknowledgements πŸ™

A special thanks to the following libraries and their contributors:

  • FilePond : For providing a seamless and user-friendly file uploading experience.
  • pdf2json : For its efficient and robust PDF parsing capabilities.

License πŸ“œ

MIT License

About

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /