Based on Foxit Quick PDF Library,python interface
- 
 Updated
 Apr 4, 2020 
- Python
Based on Foxit Quick PDF Library,python interface
A simple demonstration of how you can implement retrieval augmented generation (RAG) for a book.
PDF 문서에서 GPU 가속 처리로 고품질 질의응답(QA) 데이터를 자동 생성하고 LLM을 효율적으로 파인튜닝하는 솔루션입니다. Unstructured 라이브러리와 AWS Bedrock Claude로 도메인 특화 QA 쌍을 생성하고, LoRA 기법으로 경량 모델을 훈련합니다.
Converts scanned documents and ordinary documents into speech mp3 using Amazon Polly
A Telegram bot which extract Text from PDF, also extract the Images of PDF Pages. Made with Python
CLI for merging PDF contexts.
NLP Pdf Minning Extracting text from pdf
A resume parser that extracts key details from PDF files using Groq's LLM
Opinionated and Sophisticated Document Region Analyzer.
Highlights the key matches between your Given PDF and the description text
UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.
PDF Text Finder Console App along with page number
A small Python script to extract and format text content from PDF files, with filter rules and other niceties.
A Python-based tool for extracting structured data from PDFs using OCR and regex, and exporting it to CSV. Ideal for processing invoices, logs, or scanned documents into organized, usable datasets.
Multiple File Format (PDF/DOC/DOCX/XLSX/XLS/CSV) Text Extraction Utility Project in Java Programming Language
Tests of OCR and RAG with LLMs
Extracts Data from provided PDF using key words to identify relevant datapoints. Using UglyToad PDFPIG(great lib btw)
This is for Technology Application Project at Swinburne University of Technology
An AI-powered tool that extracts text from PDF resumes, predicts the most suitable job role using Hugging Face BART MNLI, and rewrites the resume in a professional LaTeX format using Google FLAN-T5. Built with Flask for the backend and Streamlit for the frontend, it offers a fast, user-friendly way to analyze and improve resumes in real time.
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
Add a description, image, and links to the pdf-text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-text-extraction topic, visit your repo's landing page and select "manage topics."