nextjs-pdf-parser.mp4
I was having some trouble parsing PDFs in Next.js, so I thought I would make this template for anyone else who was facing the same issues as me. I hope this template saves you some time and trouble. It's a basic create-next-app with PDF parsing implemented using the pdf2json library and file uploading facilitated by FilePond.
-
Clone the repository:
-
git clone [repository-url] -
Navigate to the project directory:
-
cd nextjs-pdf-parser -
Install dependencies:
-
Windows only: In
app\api\upload\route.tson line 22, changetempFilePathto a valid path. Make sure it starts from the root drive, for example:C:/coding/nextjs-pdf-parser/public/${fileName}.pdf -
npm install # or yarn install -
Run the development server:
npm run dev # or yarn devVisit
http://localhost:3000to view the application.
Navigate to http://localhost:3000 and use the FilePond uploader to select and upload a PDF. Once uploaded, the content of the PDF is parsed and printed to the server console (Note: it will not be printed to the browser log).
-
nodeUtil is not defined Error:
To bypass the
nodeUtil is not definederror, the following configuration was added tonext.config.js:
const nextConfig = { experimental: { serverComponentsExternalPackages: ['pdf2json'], }, }; module.exports = nextConfig;
See more details here
-
Blank output from
pdfParser.getRawTextContent():This issue might be due to incorrect type definitions. There are two potential solutions:
-
Fix TypeScript definitions: Update the type definition for PDFParser.
-
Bypass type checking: Instantiate PDFParser as shown:
const pdfParser = new (PDFParser as any)(null, 1);
For more details, refer to my comment on this GitHub issue.
-
A special thanks to the following libraries and their contributors:
- FilePond : For providing a seamless and user-friendly file uploading experience.
- pdf2json : For its efficient and robust PDF parsing capabilities.
MIT License