Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

TechyCSR/AI-Powered-Document-Insight-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

25 Commits

Repository files navigation

AI-Powered Document Insight Tool

Professional-grade resume analysis platform powered by advanced AI models with intelligent document type detection and comprehensive insight generation.

Live Demo API Status

🎯 Project Overview

An enterprise-grade document analysis platform specializing in resume processing with AI-powered insights. The system automatically detects document types and provides structured analysis optimized for professional recruitment and career development workflows.

πŸ”‘ Key Specializations

  • Resume Analysis: Specialized parsing for professional documents with structured output
  • Document Type Detection: Intelligent classification of uploaded documents
  • Multi-AI Integration: Dual AI provider setup with intelligent fallback mechanisms
  • Real-time Processing: Asynchronous document processing with live progress tracking

πŸ—οΈ System Architecture

graph TB
 subgraph "Frontend Layer (Vercel)"
 A[React 18 + TypeScript]
 A1[TailwindCSS Styling]
 A2[Clerk Authentication]
 A3[Mobile Detection]
 A4[404 Error Handling]
 end
 
 subgraph "API Gateway (Vercel Serverless)"
 B[FastAPI Backend]
 B1[JWT Validation]
 B2[File Upload Handler]
 B3[Error Management]
 B4[Health Monitoring]
 end
 
 subgraph "Document Processing Engine"
 C[PDFPlumber Extractor]
 C1[Document Type Detector]
 C2[Content Sanitization]
 C3[Text Preprocessing]
 end
 
 subgraph "AI Analysis Layer"
 D[Sarvam AI Primary]
 E[Gemini AI Secondary]
 F[Intelligent Fallback]
 G[Response Structuring]
 end
 
 subgraph "Data Layer"
 H[MongoDB Atlas]
 H1[User Document Store]
 H2[Binary PDF Storage]
 H3[Analysis History]
 end
 
 subgraph "Authentication & Security"
 I[Clerk Identity Provider]
 I1[JWT Token Management]
 I2[User Session Handling]
 end
 
 A -->|HTTPS API Calls| B
 B -->|Document Upload| C
 C -->|Extracted Text| D
 D -->|Fallback on Error| E
 E -->|Final Fallback| F
 D -->|Success Response| G
 E -->|Success Response| G
 F -->|Keyword Analysis| G
 G -->|Structured Data| H
 B -->|Auth Validation| I
 H -->|User Data| B
 B -->|JSON Response| A
Loading

πŸ› οΈ Technology Stack

Backend Infrastructure

Component Technology Version Purpose
Framework FastAPI 0.104.1 High-performance async API
Language Python 3.12+ Core backend logic
Database MongoDB Atlas 7.0 Document storage & history
PDF Processing PDFPlumber 0.9.0 Text extraction from PDFs
HTTP Client httpx 0.24.0 Async AI API communications
Authentication PyJWT 2.8.0 JWT token validation
Deployment Vercel Serverless - Auto-scaling serverless functions

Frontend Architecture

Component Technology Version Purpose
Framework React 18.2.0 Component-based UI framework
Language TypeScript 5.0.2 Type-safe development
Build Tool Vite 5.0+ Fast development & bundling
Styling TailwindCSS 3.3.0 Utility-first CSS framework
Authentication Clerk React 4.29.0 User authentication SDK
HTTP Client Axios 1.6.0 API communication
Icons Lucide React 0.263.0 Consistent iconography
Routing React Router 6.8.0 Client-side navigation


πŸš€ Local Development Setup

Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Git for version control
  • MongoDB Atlas account (free tier available)
  • Clerk account for authentication
  • AI API Keys (optional - fallback available)

1. Clone Repository

git clone https://github.com/TechyCSR/AI-Powered-Document-Insight-Tool.git
cd AI-Powered-Document-Insight-Tool

2. Backend Setup

cd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Setup environment variables
cp env.example .env

Edit backend/.env with your configuration:

# MongoDB Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/document_insights?retryWrites=true&w=majority
# Clerk Configuration
CLERK_SECRET_KEY=sk_test_your_clerk_secret_key_here
# AI API Keys (Optional - fallback available)
SARVAM_API_KEY=your_sarvam_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
# Application Settings
ENVIRONMENT=development
DEBUG=True
ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000

3. Frontend Setup

cd frontend
# Install dependencies
npm install
# Setup environment variables
cp env.example .env.local

Edit frontend/.env.local with your configuration:

VITE_CLERK_PUBLISHABLE_KEY=pk_test_your_clerk_publishable_key_here
VITE_API_URL=http://localhost:8000

4. Start Development Servers

Terminal 1 - Backend API:

cd backend
# Ensure virtual environment is activated
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend Application:

cd frontend
npm run dev

5. Access Application

6. Development Workflow

  1. Backend Changes: Auto-reload enabled with --reload flag
  2. Frontend Changes: Hot Module Replacement (HMR) active
  3. Database: MongoDB Atlas automatically syncs
  4. Authentication: Clerk handles dev/prod environments automatically

πŸ“ Quick Setup Checklist

  • Python 3.9+ installed
  • Node.js 18+ installed
  • MongoDB Atlas account created
  • Clerk account setup with project created
  • Environment variables configured
  • Dependencies installed
  • Both servers running
  • Application accessible at localhost:5173

🌐 Production Deployment

Production URLs

API Endpoints

Core Endpoints

GET  /api/v1/health # System health check
POST  /api/v1/upload-resume # Document upload & analysis
GET  /api/v1/insights # User document history
GET  /api/v1/document/{id}/preview # PDF preview (authenticated)

Health Check Response

{
 "status": "healthy",
 "timestamp": "2025εΉ΄08月30ζ—₯T13:27:21.331972",
 "environment": "production",
 "database": {
 "connected": true,
 "status": "connected",
 "error": null,
 "insights_count": 42
 }
}

Upload Resume Request

POST /api/v1/upload-resume
Content-Type: multipart/form-data
Authorization: Bearer {jwt_token}
file: {pdf_file}
provider: "sarvam" | "gemini"

Upload Response (Resume)

{
 "summary": "**πŸ‘€ Name:** John Doe\n**πŸ“§ Contact:** john.doe@email.com, +1-555-0123\n**πŸ’Ό Professional Summary:** Experienced software engineer with 5+ years...",
 "provider": "sarvam",
 "is_fallback": false,
 "filename": "resume.pdf",
 "upload_date": "2025εΉ΄08月30ζ—₯T13:27:21.331972",
 "document_id": "66d1b2c3d4e5f6789abcdef0"
}

🧠 AI Analysis Features

Resume Analysis Format

πŸ‘€ Name: [Extracted full name]
πŸ“§ Contact: [Phone, Email, Location]
πŸ’Ό Professional Summary: [Key qualifications highlights]
🎯 Core Skills: [Technical and soft skills]
πŸ’ͺ Experience Highlights:
 β€’ [Most relevant role with quantified achievements]
 β€’ [Second important position with metrics]
 β€’ [Third significant role with impact data]
πŸŽ“ Education: [Degrees, institutions, GPA if available]
πŸ† Notable Achievements:
 β€’ [Top accomplishment with quantified results]
 β€’ [Second significant achievement]
 β€’ [Third notable accomplishment]
πŸ“Š Career Insights:
 β€’ Years of Experience: [Calculated total]
 β€’ Industry Focus: [Primary domain]
 β€’ Career Level: [Entry/Mid/Senior/Executive]

General Document Analysis Format

πŸ“„ Document Type: [Auto-detected type]
πŸ“ Document Summary: [Comprehensive overview]
πŸ” Key Insights: [Major findings and observations]
πŸ“Š Main Topics: [Primary and secondary topics]
πŸ’‘ Critical Information: [Important facts and recommendations]
🎯 Target Audience: [Intended readers]
πŸ“ˆ Key Takeaways: [Actionable insights]

Document Type Detection

The system automatically detects document types:

  • Resume/CV: Professional experience documents
  • Research Paper: Academic and scientific documents
  • Proposal: Project and business proposals
  • Legal Document: Contracts and agreements
  • Report: Analysis and findings documents
  • General Document: Other document types

πŸš€ Core Functionality

1. Document Upload & Processing

  • File Validation: PDF-only, 10MB size limit
  • Text Extraction: Advanced PDFPlumber integration
  • Content Sanitization: Clean text preprocessing
  • Progress Tracking: Real-time upload status

2. AI-Powered Analysis

  • Document Type Detection: Intelligent classification
  • Dual AI Integration: Primary (Sarvam) + Secondary (Gemini)
  • Intelligent Fallback: Keyword frequency analysis
  • Structured Output: Formatted, actionable insights

3. User Management

  • Secure Authentication: Clerk-based JWT validation
  • Personal History: Complete analysis tracking
  • PDF Preview: Authenticated document viewing
  • Session Management: Persistent user sessions

4. Enterprise Features

  • Responsive Design: Mobile detection with desktop optimization
  • Error Handling: Professional 404 pages and error management
  • Health Monitoring: System status tracking
  • Performance Optimization: Async processing and caching

πŸ“± User Experience Design

Responsive Behavior

  • Desktop Optimized: Full-featured dashboard experience
  • Mobile Detection: Automatic redirection to mobile-optimized messaging
  • Tablet Support: Warning banners for limited mobile functionality
  • Progressive Enhancement: Graceful degradation across devices

Interface Highlights

  • Modern Dashboard: Clean, professional layout
  • Drag-and-Drop Upload: Intuitive file handling
  • Real-time Feedback: Progress indicators and status updates
  • Theme Support: Dark/light mode toggle
  • Error Recovery: User-friendly error messages and recovery options

⚑ Performance & Scalability

Backend Optimization

  • Serverless Architecture: Auto-scaling Vercel functions
  • Async Processing: Non-blocking I/O operations
  • Connection Pooling: Optimized MongoDB connections
  • Error Recovery: Graceful degradation and reconnection logic

Frontend Optimization

  • Code Splitting: Dynamic imports for reduced bundle size
  • Lazy Loading: On-demand component loading
  • Caching Strategy: Optimized API response caching
  • Bundle Analysis: Size optimization and tree shaking

Database Performance

  • Indexed Queries: Optimized user-based data retrieval
  • Document Storage: Efficient binary PDF storage
  • Connection Management: Serverless-optimized pooling

πŸ”’ Security Implementation

Authentication & Authorization

  • JWT Validation: Secure token-based authentication
  • User Isolation: Strict data access controls
  • Session Management: Secure session handling
  • API Protection: Authenticated endpoint access

Data Security

  • File Validation: Strict PDF-only upload enforcement
  • Size Limits: 10MB maximum file size
  • Content Sanitization: Safe text processing
  • Secure Storage: Encrypted MongoDB Atlas storage

Infrastructure Security

  • HTTPS Enforcement: SSL/TLS encryption
  • Environment Variables: Secure configuration management
  • API Rate Limiting: Protection against abuse
  • Error Handling: Secure error message disclosure

πŸ“Š System Status

βœ… Backend API: Operational (99.9% uptime)
βœ… Frontend App: Deployed & Responsive
βœ… Database: MongoDB Atlas Connected (42 documents)
βœ… AI Services: Sarvam + Gemini Operational
βœ… Authentication: Clerk Integration Active
βœ… File Processing: PDF Upload & Analysis Working
βœ… Mobile Support: Detection & Redirection Active
βœ… Error Handling: 404 Pages & Recovery Implemented

🎯 Project Highlights

  • Production-Ready: Fully deployed and operational system
  • Enterprise-Grade: Professional UI/UX with comprehensive error handling
  • AI-Specialized: Optimized for resume analysis with fallback intelligence
  • Scalable Architecture: Serverless deployment with auto-scaling capabilities
  • Modern Tech Stack: Latest frameworks and best practices implementation
  • Security-First: Comprehensive authentication and data protection
  • Performance-Optimized: Fast loading times and efficient processing

Built with modern web technologies for professional document analysis workflows. Deployed and operational at summary.techycsr.me "filename": "resume.pdf", "upload_date": "2025-08-28T14:00:00Z", "provider": "sarvam", "summary": "Professional summary...", "is_fallback": false, "file_size": 1234567 } ], "total_count": 1 }


## πŸš€ Deployment
### Prerequisites for Deployment
1. **Vercel Account** - For hosting both frontend and backend
2. **MongoDB Atlas** - Cloud database
3. **Clerk Account** - Authentication service
4. **Domain** (optional) - For custom domain
### Backend Deployment (Vercel)
1. **Connect to Vercel**
```bash
cd backend
npm i -g vercel # Install Vercel CLI
vercel # Follow the prompts
  1. Set Environment Variables Go to your Vercel dashboard and add:
  • MONGODB_URI
  • CLERK_SECRET_KEY
  • SARVAM_API_KEY
  • GEMINI_API_KEY
  • ENVIRONMENT=production
  • DEBUG=False
  • ALLOWED_ORIGINS=https://your-frontend-domain.vercel.app

Frontend Deployment (Vercel)

  1. Connect to Vercel
cd frontend
vercel # Follow the prompts
  1. Set Environment Variables Add to Vercel dashboard:
  • VITE_CLERK_PUBLISHABLE_KEY
  • VITE_API_BASE_URL=https://your-backend-domain.vercel.app/api/v1

Post-Deployment Configuration

  1. Update Clerk Settings

    • Add your production domains to allowed origins
    • Update redirect URLs
  2. Update API CORS

    • Add production frontend URL to backend CORS settings
  3. Test the Deployment

    • Verify authentication flow
    • Test file upload functionality
    • Check AI provider integration

πŸ‘¨β€πŸ’» Developer

Built with ❀️ by @TechyCSR

Professional full-stack developer specializing in AI-powered applications and modern web technologies.


Β© 2025 TechyCSR β€’ AI-Powered Document Analysis Platform β€’ summary.techycsr.me

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /