Next.js TypeScript Ollama ChromaDB ElevenLabs Tailwind CSS
A comprehensive document processing and chat system that combines local AI processing with premium cloud TTS for the ultimate document intelligence experience. Upload any document format, extract content with OCR, and have natural conversations powered by local AI models with professional voice synthesis.
Privacy Voice Quality Architecture
- PDF, DOCX, DOC - Word processors and PDFs with full text extraction
- TXT, MD, HTML - Text and markup formats
- CSV, XLSX, XLS - Spreadsheets and data files with table understanding
- JSON, XML - Structured data formats
- RTF - Rich text format support
- Images - JPG, PNG, TIFF, BMP, WebP with advanced OCR
- Local LLM (Ollama) - Private document analysis with tinyllama model
- ChromaDB Vector Store - Advanced semantic search and similarity matching
- Smart Text Processing - Automatic content cleaning and optimization
- Context-Aware Responses - AI maintains conversation history and document understanding
- Complete Document Privacy - All document processing happens locally
- ElevenLabs TTS - Professional, natural-sounding voice synthesis
- Conversational Voices - 5 optimized voices for different contexts:
- Rachel - Warm, natural female voice (perfect for conversations)
- Adam - Deep, engaging male voice (professional content)
- Antoni - Clear, articulate voice (document narration)
- Sam - Casual, friendly voice (relaxed interactions)
- Bella - Expressive, dynamic voice (engaging content)
- Advanced Voice Controls - Stability, similarity boost, expression tuning
- Smart Request Management - Automatic rate limiting and queue handling
- Auto-Play Support - Seamless voice responses for chat messages
- Document-Specific Knowledge - AI understands your document content
- Real-Time Streaming - Fast response generation
- Smart Commands - Extract dates, people, places, action items
- Context Preservation - Maintains conversation flow across sessions
- Multi-Format Understanding - Adapts responses based on document type
- Hierarchical Organization - Folder structure with nested categories
- Advanced Search - Semantic search across all documents
- Tag System - Custom tagging and categorization
- Processing Analytics - Track usage and performance metrics
- Real-Time Status - Live processing updates and health monitoring
- Responsive Design - Optimized for desktop, tablet, and mobile
- Glass Morphism UI - Beautiful, modern interface design
- Dark/Light Themes - Customizable appearance
- Real-Time Updates - Live document processing and chat updates
- Progressive Enhancement - Works offline with cached content
- Next.js 15.1.7 - React framework with App Router and RSC
- TypeScript 5.x - Full type safety and IntelliSense
- Tailwind CSS 3.4.17 - Utility-first styling with custom components
- Framer Motion - Smooth animations and transitions
- Radix UI - Accessible, unstyled component primitives
- Ollama (tinyllama) - Local LLM for document analysis
- LangChain - AI application framework and prompt management
- ChromaDB - Vector database for semantic search
- ElevenLabs API - Premium text-to-speech synthesis
- Pinecone - Cloud vector database for embeddings
- Clerk - User authentication and session management
- Firebase - Document metadata and user data storage
- Hybrid Document Store - In-memory + persistent storage strategy
- Mammoth.js - Advanced DOCX processing and conversion
- PDF-Parse - PDF text extraction and metadata
- Sharp - High-performance image processing
- XLSX - Excel file parsing and data extraction
- Cheerio - HTML parsing and content extraction
node --version # v18+ required npm --version # v9+ required
# macOS brew install ollama # Linux curl -fsSL https://ollama.com/install.sh | sh # Windows - Download from: https://ollama.com/download/windows
# Using pip (recommended) pip install chromadb # Using conda conda install -c conda-forge chromadb # Using Docker docker run -p 8000:8000 chromadb/chroma
# Clone the repository git clone https://github.com/your-username/ai-challenge.git cd ai-challenge # Install all dependencies npm install # Set up environment variables cp .env.local.example .env.local # Edit .env.local with your configuration
Create .env.local with the following configuration:
# Authentication (Required) CLERK_SECRET_KEY=your_clerk_secret_key NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key # Firebase Configuration (Required) FIREBASE_PROJECT_ID=your_firebase_project_id FIREBASE_PRIVATE_KEY="your_firebase_private_key" FIREBASE_CLIENT_EMAIL=your_firebase_client_email # AI Configuration (Required) OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=tinyllama:latest # Vector Database (Required) PINECONE_API_KEY=your_pinecone_api_key # Premium TTS (Required for voice features) ELEVENLABS_API_KEY=your_elevenlabs_api_key # Optional: Payment Integration STRIPE_SECRET_KEY=your_stripe_secret_key NEXT_PUBLIC_SCHEMATIC_PUBLISHABLE_KEY=your_schematic_key
Open 3 separate terminals:
# Terminal 1: Start Ollama Service ollama serve # Terminal 2: Start ChromaDB Server chroma run --host localhost --port 8000 # Terminal 3: Start Next.js Application npm run dev
# Install the recommended model ollama pull tinyllama:latest # Verify installation ollama list ollama run tinyllama:latest
- Main Application: http://localhost:3001
- Dashboard: http://localhost:3001/dashboard
- System Health: http://localhost:3001/api/system-status
- TTS Test: http://localhost:3001/api/tts
- Navigate to Upload: Go to
/dashboard/upload - Select Documents: Drag & drop or browse for files
- Choose Options: Enable OCR for scanned documents
- Monitor Progress: Watch real-time processing status
- Verify Completion: Ensure "completed" status before chatting
- Access Document: Go to
/dashboard/files/[document-id] - Enable Voice: Click the speaker icon for TTS responses
- Natural Conversation: Ask questions in plain English
- Smart Commands:
"Summarize this document" "Extract all dates mentioned" "Find people and organizations" "What are the key action items?" "Compare this with my other documents"
- Enable TTS: Click speaker button in chat interface
- Choose Voice: Select from 5 conversational voices
- Adjust Settings: Fine-tune stability, similarity, expression
- Auto-Play: Enable automatic voice responses
- Advanced Controls: Access streaming and quality settings
- Folders: Create hierarchical folder structures
- Tags: Add custom tags for categorization
- Search: Use semantic search across all documents
- Filtering: Filter by date, type, tags, or content
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Next.js App β β Ollama β β ChromaDB β
β (Frontend) ββββββ (Local AI) β β (Vector Store) β
β Port: 3001 β β Port: 11434 β β Port: 8000 β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Clerk β β ElevenLabs β β Pinecone β
β (Auth) β β (Premium TTS) β β (Embeddings) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Firebase β β Document β β Hybrid Store β
β (Metadata) β β Processing β β (In-Memory) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Upload β Document parsing and text extraction
- Processing β Content chunking and embedding generation
- Storage β Vector storage in ChromaDB + metadata in Firebase
- Query β User question β Vector similarity search
- AI Response β Context-aware response via Ollama
- TTS β Natural voice synthesis via ElevenLabs
ai-challenge/
βββ π± app/ # Next.js App Router
β βββ api/ # API Routes
β β βββ tts/ # Text-to-speech endpoints
β β βββ pinecone/ # Vector database operations
β β βββ upload-document/ # Document upload handling
β β βββ system-status/ # Health monitoring
β β βββ realtime/ # Real-time updates
β βββ dashboard/ # Main application interface
β β βββ files/[id]/ # Document viewer and chat
β β βββ upload/ # Document upload interface
β β βββ page.tsx # Dashboard home
β βββ globals.css # Global styles
βββ π§© components/ # React Components
β βββ ui/ # Shadcn/ui components
β βββ PDFChatInterface.tsx # Main chat interface
β βββ UniversalDocumentViewer.tsx # Document display
β βββ EnhancedTTSControls.tsx # Voice controls
β βββ PineconeDocumentPage.tsx # Document management
βββ π lib/ # Core Libraries
β βββ hybrid-tts-service.ts # ElevenLabs TTS integration
β βββ elevenlabs-client.ts # ElevenLabs API wrapper
β βββ hybrid-chat-service.ts # AI conversation logic
β βββ hybrid-document-store.ts # Document storage layer
β βββ pinecone-client.ts # Vector database client
β βββ ollama-client.ts # Local AI integration
β βββ pinecone-embeddings.ts # Embedding management
βββ π¨ styles/ # Tailwind CSS
βββ π public/ # Static assets
βββ βοΈ Configuration files # Next.js, TypeScript, etc.
POST /api/upload-document Content-Type: multipart/form-data Body: file (any supported format) Response: { id, fileName, status, processingTime }
GET /api/files/{userId}_{timestamp}_{hash}?includeContent=true Response: { document, content, metadata }
POST /api/pinecone/embed Content-Type: application/json { "text": "document content", "documentId": "doc_id", "fileName": "document.pdf" }
POST /api/pinecone/chat Content-Type: application/json { "question": "What is this document about?", "documentId": "doc_id", "fileName": "document.pdf", "history": [...] }
POST /api/tts Content-Type: application/json { "text": "Hello, this is a test", "voice_id": "EXAVITQu4vr4xnSDxMaL", "stability": 0.75, "similarity_boost": 0.85, "style": 0.2 }
GET /api/tts Response: { provider: "elevenlabs", available: true, conversationalVoices: {...} }
GET /api/system-status Response: { ollama: { available, models }, chromadb: { available, collections }, elevenlabs: { available, voices, usage } }
- RAM: 8GB+ recommended (16GB for optimal performance)
- Storage: SSD recommended (faster vector operations)
- CPU: Multi-core processor (AI inference benefits from more cores)
- Network: Stable internet for ElevenLabs TTS (optional for local-only mode)
# Use tinyllama for fastest responses ollama pull tinyllama:latest # Monitor model memory usage ollama ps # Optimize model parameters curl http://localhost:11434/api/chat -X POST -d '{ "model": "tinyllama:latest", "options": { "temperature": 0.7, "top_p": 0.9, "num_predict": 256 } }'
- Chunk Size: 1000 characters optimal for balance of context and speed
- Overlap: 200 characters for better context preservation
- Batch Processing: Process multiple documents in parallel
- Memory Management: Clear unused embeddings periodically
- Request Queuing: Automatic handling of concurrent request limits
- Voice Caching: Reuse voice settings for consistent performance
- Streaming: Use streaming mode for longer texts
- Rate Limiting: Built-in 100ms delay between requests
- β Local AI Processing - Documents analyzed on your machine
- β Secure Authentication - Clerk-based user management
- β Encrypted Storage - Firebase security rules and encryption
- β API Security - Rate limiting and input validation
- β Data Isolation - User-specific document access controls
- Environment Variables - Secure credential management
- CORS Protection - Restricted cross-origin requests
- Input Sanitization - All user inputs validated and cleaned
- Error Handling - No sensitive information in error messages
- Audit Logging - Track document access and processing
- GDPR Ready - User data deletion and export capabilities
- SOC 2 Compatible - Security controls and monitoring
- Enterprise Security - Role-based access control foundation
# Check Ollama status ollama list # Start Ollama if not running ollama serve # Verify model installation ollama pull tinyllama:latest ollama run tinyllama:latest # Test API directly curl http://localhost:11434/api/version
# Check ChromaDB status curl http://localhost:8000/api/v1/heartbeat # Start ChromaDB chroma run --host localhost --port 8000 # Alternative with Docker docker run -p 8000:8000 chromadb/chroma # Check for port conflicts lsof -i :8000
# Check TTS system status curl http://localhost:3001/api/tts # System automatically handles: # - Request queuing (max 2 concurrent) # - Exponential backoff retry (1s, 2s, 5s) # - User-friendly error messages # Verify API key echo $ELEVENLABS_API_KEY
- β File Format: Ensure supported format (PDF, DOCX, TXT, etc.)
- β File Size: Keep under 25MB for optimal performance
- β Encoding: Use UTF-8 encoding for text files
- β Permissions: Verify file read permissions
- β
Service Health: Check
/api/system-status
# Complete system health check curl http://localhost:3001/api/system-status | jq # Test individual components curl http://localhost:11434/api/version # Ollama curl http://localhost:8000/api/v1/heartbeat # ChromaDB curl http://localhost:3001/api/tts # TTS System # View application logs npm run dev # Check console output # Clear application cache rm -rf .next npm run dev
# Build for production npm run build # Start production server npm start # Or deploy to Vercel npx vercel --prod
# Ollama production setup ollama serve --host 0.0.0.0 --port 11434 # ChromaDB production setup chroma run --host 0.0.0.0 --port 8000 --log-level INFO # Environment variables for production NODE_ENV=production OLLAMA_BASE_URL=http://your-ollama-server:11434
# Dockerfile example FROM node:18-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . RUN npm run build EXPOSE 3000 CMD ["npm", "start"]
- Load Balancing: Multiple Next.js instances behind reverse proxy
- Database Scaling: ChromaDB cluster setup for high availability
- Caching: Redis for session and response caching
- CDN: Static asset optimization and global distribution
- Fork & Clone: Create your own copy of the repository
- Branch: Create feature branch:
git checkout -b feature/amazing-feature - Develop: Make changes following code style guidelines
- Test: Ensure all functionality works as expected
- Lint: Run
npm run lintand fix any issues - Commit: Use conventional commits:
git commit -m 'feat: add amazing feature' - Pull Request: Submit PR with detailed description
- TypeScript: Strict mode enabled, full type coverage
- ESLint: Next.js configuration with custom rules
- Prettier: Consistent code formatting
- Conventional Commits: Standardized commit messages
- Component Structure: Functional components with hooks
- API Design: RESTful endpoints with proper error handling
- Unit Tests: Test utility functions and API endpoints
- Integration Tests: Test document processing workflows
- UI Tests: Verify component functionality across devices
- Performance Tests: Monitor AI response times and memory usage
| Format | Extension | Processing | OCR Support |
|---|---|---|---|
.pdf |
β Text + Images | β Scanned PDFs | |
| Word | .docx, .doc |
β Full formatting | β N/A |
| Excel | .xlsx, .xls |
β All sheets | β N/A |
| Text | .txt, .md |
β UTF-8 | β N/A |
| Web | .html, .xml |
β Clean extraction | β N/A |
| Data | .json, .csv |
β Structured parsing | β N/A |
| Images | .jpg, .png, .tiff |
β OCR processing | β Text extraction |
| Model | Size | Speed | Quality | Memory | Use Case |
|---|---|---|---|---|---|
tinyllama:latest |
637MB | β‘ Fast | Good | 2GB | Recommended - Production |
gemma2:2b |
1.6GB | Fast | High | 4GB | Balanced performance |
qwen2.5:3b |
1.9GB | Medium | High | 6GB | Advanced analysis |
llama3.2 |
4.7GB | Medium | Highest | 8GB | Complex reasoning |
| Voice | ID | Gender | Style | Best For |
|---|---|---|---|---|
| Rachel | EXAVITQu4vr4xnSDxMaL |
Female | Conversational | Chat responses |
| Adam | pNInz6obpgDQGcFmaJgB |
Male | Professional | Document reading |
| Antoni | ErXwobaYiN019PkySvjV |
Male | Clear | Technical content |
| Sam | yoZ06aMxZJJ28mfd3POQ |
Male | Casual | Friendly interactions |
| Bella | EXAVITQu4vr4xnSDxMaL |
Female | Expressive | Dynamic content |
- ElevenLabs premium TTS integration
- Advanced rate limiting and error handling
- Conversational voice optimization
- Hybrid document storage system
- Real-time processing status updates
- Enhanced UI with glass morphism design
- Smart request queuing for API limits
- Multi-Document Chat - Cross-document conversations and analysis
- Advanced OCR - Table extraction and handwriting recognition
- Voice Input - Speech-to-text for hands-free interaction
- Mobile App - React Native companion application
- API Gateway - RESTful API for third-party integrations
- Collaborative Workspaces - Team document sharing and chat
- Document Comparison - Side-by-side analysis and diff views
- Advanced Analytics - Usage insights and optimization suggestions
- Multi-language Support - UI localization and model support
- Enterprise SSO - SAML and OIDC integration
- Workflow Automation - Zapier and webhook integrations
This project is licensed under the MIT License - see the LICENSE file for complete details.
- Ollama: Apache License 2.0
- ChromaDB: Apache License 2.0
- ElevenLabs: Commercial API service
- Next.js: MIT License
- All npm dependencies: Various open-source licenses
- Ollama - Local AI model inference engine
- ChromaDB - Vector database for semantic search
- ElevenLabs - Premium text-to-speech synthesis
- Next.js - React framework and deployment platform
- Pinecone - Managed vector database service
- Clerk - Authentication and user management
- Firebase - Document metadata storage
- LangChain - AI application framework
- Tailwind CSS - Utility-first CSS framework
- Radix UI - Accessible component primitives
- π Documentation: This README and inline code comments
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π§ Technical Support: Check troubleshooting section first
- π€ Pull Requests: Welcome! Please follow contribution guidelines
- π Documentation: Help improve docs and examples
- π§ͺ Testing: Report bugs and help with quality assurance
- π Translation: Assist with internationalization efforts
Privacy First Local Processing Premium Voice
π Get Started β’ π Documentation β’ π Report Bug β’ π‘ Request Feature
β Star this repo if you find it useful! β