Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

VisionExpo/SamvidAI

Repository files navigation

🧠 SamvidAI

Intelligent Contract Analysis Engine powered by OpticalRAG


📐 System Design:
High Level Design → docs/HLD.md


🚧 Current Project Status

SamvidAI is in active design and prototyping.

  • Core architecture and OpticalRAG pipeline are defined
  • Experiments use synthetic or publicly available documents
  • No confidential or client legal data is used
  • Benchmarks are prototype / controlled estimates
  • Legal workflow validation is ongoing via practicing professionals

The project prioritizes correctness, safety, and validation before scale.


🧭 What is SamvidAI?

SamvidAI is a next-generation legal document intelligence system built to analyze long, complex contracts (50–300+ pages) using layout-aware, vision-first retrieval.

Traditional OCR + NLP systems flatten documents into text, losing structure and introducing hallucinations.
SamvidAI instead introduces OpticalRAG — a multimodal retrieval-augmented architecture that preserves spatial context while reducing cost and latency.


🧠 Philosophy

We do not replace attorneys. We empower them.

SamvidAI automates extraction, retrieval, and risk flagging so legal professionals can focus on judgment, validation, and strategy.

Human-in-the-loop review is a first-class design principle.


📘 Architecture & Design Documentation

SamvidAI follows a documentation-first, system-design-driven approach.
All major architectural and design decisions are formally documented and versioned.

🔹 High Level Design (HLD)

The High Level Design document covers:

  • End-to-end system architecture
  • OpticalRAG design rationale
  • Component responsibilities
  • Model & LLM strategy (Gemini 2.5 Pro usage)
  • Data flow, security, ethics, and deployment
  • Scalability, risks, and future extensions

📄 Read on GitHub
👉 docs/HLD.md

⬇️ Download full design document (DOCX, 90+ pages)
👉 docs/HLD.docx

The DOCX version is the authoritative long-form design, suitable for deep review and offline reading.


❌ The Problem

Legal contracts are:

  • Long and dense
  • Highly structured (clauses, tables, headers)
  • Extremely risk-sensitive

Existing approaches fail because:

  • OCR destroys layout semantics
  • Full-document LLM ingestion is expensive
  • Long-context hallucinations are common
  • Clause hierarchy and visual grouping are ignored

✅ The Solution — OpticalRAG

OpticalRAG is a vision-first RAG pipeline that:

  • Treats documents as visual data
  • Retrieves only relevant regions
  • Converts to text only when necessary

Key Benefits

  • 🔻 Significant token reduction
  • ⚡ Faster inference
  • 🧭 Layout-aware reasoning
  • 📄 Scales to very long contracts
  • 🧠 Reduced hallucinations

🧠 OpticalRAG Architecture

Traditional RAG pipelines fail on massive legal documents due to lossy OCR and limited context windows.

OpticalRAG solves this by design.

PDF Contract
↓
High-Resolution Page Images (300 DPI)
↓
Layout-Aware Segmentation (LayoutLMv3)
↓
Semantic Regions (Clauses, Tables, Headers)
↓
Multimodal Embeddings (Text + Vision)
↓
Vector Retrieval (Query-Aware)
↓
LLM Reasoning on Relevant Regions Only

Why OpticalRAG Works

  • Preserves spatial and structural context
  • Prevents lost-in-the-middle failures
  • Optimized for consumer GPUs
  • LLM is used for reasoning, not retrieval

🧩 Core Capabilities

🔍 Layout-Aware Retrieval

  • Vision-first document understanding
  • Hierarchical retrieval (page → section → clause)
  • Query-aware region selection

⚠️ Risk Flagging (Prototype)

Identifies potentially risky clauses such as:

  • One-sided obligations
  • Unusual termination rights
  • Missing liability protections

Risk levels:

  • 🔴 High Risk
  • 🟠 Review Needed
  • 🟢 Standard

📝 Smart Summarization

Role-specific summaries:

  • Executive overview
  • Key obligations
  • Financial exposure
  • Termination & liability highlights

👨‍⚖️ Human-in-the-Loop Review

  • Attorneys can accept or reject AI findings
  • Feedback enables iterative improvement
  • Designed for assistive decision-making

🛠️ Tech Stack

Core

  • Python 3.10+
  • FastAPI
  • Streamlit

Vision & Layout

  • LayoutLMv3
  • OpenCV
  • PaddleOCR

Embeddings & Retrieval

  • OpenCLIP (ViT-H/14)
  • BGE / E5
  • ChromaDB

LLMs

  • Gemini 2.5 Pro (primary, cloud reasoning)
  • Qwen2.5-7B / Mistral-7B (local fallback, quantized)

Gemini is used strictly for reasoning over retrieved regions, not full-document ingestion.


💻 Local Setup (Prototype)

1️⃣ Clone Repository

git clone https://github.com/your-username/SamvidAI.git
cd SamvidAI

2️⃣ Create Virtual Environment

python -m venv venv
source venv/bin/activate # Linux / Mac
venv\Scripts\activate # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

🎮 GPU Usage & Hardware

Tested Configuration

Component Specification
GPU RTX 4060 (8 GB VRAM)
RAM 24 GB
CPU 12-core
OS Windows / Linux

Optimized for consumer GPUs (RTX 4060, 8 GB VRAM) using quantization.

Memory Optimization

  • 4-bit quantized LLMs
  • Batched embeddings
  • Lazy region loading

Peak VRAM Usage

  • LayoutLMv3: ~2.1 GB
  • OpenCLIP: ~1.8 GB
  • LLM (7B, 4-bit): ~3.5 GB
  • ✅ Runs comfortably on consumer GPUs

▶️ Run Demo Locally

Start Backend

uvicorn api.main:app --reload

Launch UI

streamlit run ui/streamlit_app.py

Demo Flow

1. Upload a contract PDF

2. Ask a question (e.g. "What are termination risks?")

3. View:

  • Highlighted contract regions
  • Risk flags
  • Explanations

4. Accept or reject AI findings


📊 Benchmarks (Internal Evaluation)

Contract Size: 120 Pages

Based on controlled experiments and synthetic contracts. Not production guarantees.

Metric OCR + Text RAG SamvidAI (OpticalRAG)
Tokens to LLM High Significantly Lower
Latency High Reduced
Layout Accuracy Poor High
Hallucination Risk High Lower

💰 Cost Comparison (Per Document)

Approach Estimated Cost
Full-Text GPT-4 ~4ドル.20
OCR + RAG ~1ドル.90
SamvidAI ~0ドル.65

➡ ~65% cost reduction


📂 Project Structure

SamvidAI/
│
├── README.md # Product-facing overview (FIRST IMPRESSION)
├── WEBSITE.md # Landing page copy
├── DEMO.md # Demo links + walkthrough
│
├── docs/ # SYSTEM & ENGINEERING (AUTHORITATIVE DESIGN)
│ ├── HLD.md # High-Level Design
│ ├── HLD.docx # High-Level Design (Long-form, downloadable) 
│ ├── LLD.md # Low-Level Design
│ ├── ARCHITECTURE.md # Component & deployment architecture
│ ├── PIPELINE.md # End-to-end data & inference pipeline
│ ├── DATA_REPORTS.md # Metrics, charts, evaluations
│ ├── EXPERIMENTS.md # Ablations, experiments
│ ├── BENCHMARKS.md # Performance comparisons
│ ├── SECURITY.md # Security considerations
│ ├── ETHICS.md # Ethics & safety
│
├── research/ # SCIENTIFIC THINKING
│ ├── related_work.md # Prior research & models
│ ├── papers.md # Paper summaries & links
│ ├── findings.md # Your insights & failures
│
├── product/ # FOUNDER MODE
│ ├── roadmap.md # 30-90-365 day plan
│ ├── monetization.md # Business model
│ ├── user_personas.md # Target users
│ ├── go_to_market.md # Distribution strategy
│
├── src/ # CODE
│ └── samvidai/
│ ├── __init__.py
│ │
│ ├── ingestion/ # PDF → image → layout
│ │ ├── __init__.py
│ │ ├── pdf_to_image.py
│ │ └── preprocess.py
│ │
│ ├── layout/ # Layout-aware segmentation
│ │ ├── __init__.py
│ │ └── layoutlm.py
│ │
│ ├── retrieval/ # OpticalRAG core
│ │ ├── __init__.py
│ │ ├── embeddings.py
│ │ ├── vector_store.py
│ │ └── retriever.py
│ │
│ ├── risk_engine/ # Clause classification & risk scoring
│ │ ├── __init__.py
│ │ ├── classifier.py
│ │ └── scorer.py
│ │
│ ├── llm/ # LLM interfaces
│ │ ├── __init__.py
│ │ ├── prompts.py
│ │ └── inference.py
│ │
│ └── utils/
│ ├── __init__.py
│ └── logger.py
│
├── api/ # BACKEND
│ └── main.py # FastAPI app
│
├── ui/ # FRONTEND
│ └── streamlit_app.py
│
├── assets/ # VISUALS
│ ├── images/
│ ├── videos/
│ └── diagrams/
│
├── tests/
│ └── TESTING_PLAN.md/
├── docker/
│ └── Dockerfile
│
├── requirements.txt
└── .gitignore

🧪 Research Techniques Used

SamvidAI incorporates modern retrieval and LLM research, including:

  • Hierarchical RAG
  • Query-aware retrieval
  • Late chunking
  • Lost-in-the-middle mitigation
  • Contrastive multimodal embeddings
  • Hybrid rule-based + LLM reasoning
  • Human-in-the-loop active learning

🗺️ Roadmap

Phase 1 — Ingestion

  • PDF → image conversion
  • Layout segmentation

Phase 2 — OpticalRAG

  • Multimodal retrieval
  • Query-aware chunking

Phase 3 — Risk Engine

  • Clause classification
  • Red / Amber / Green scoring

Phase 4 — Review UI

  • Attorney validation
  • Feedback storage

Phase 5 — Optimization

  • Latency tuning
  • Dataset-driven improvements

🎯 Vision

SamvidAI is built with a startup-first mindset:

  • Solves a real legal pain point
  • Optimized for limited hardware
  • Open-source friendly
  • Enterprise-ready foundation

The long-term goal is to evolve SamvidAI into a full legal intelligence platform for contract review, compliance, and dispute risk forecasting.


🤝 Contributing

Contributions, ideas, and discussions are welcome.

If you're interested in:

  • Legal AI
  • Multimodal RAG
  • Human-in-the-loop systems

You’ll feel right at home here.


📜 License

MIT License

If you like this project, ⭐ star the repo and join the journey.

About

SamvidAI — Enterprise Contract Intelligence powered by OpticalRAG Multimodal document understanding system for clause extraction, legal risk scoring, and explainable contract analysis using layout-aware RAG pipelines.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /