A source attribution system that traces factual claims back to their original sources.
Given a piece of text containing a factual claim, Chase the Source will:
- Extract the factual claim
- Search for original sources
- Tell you if the source directly states, paraphrases, or contradicts the claim
Articles and social posts often contain factual claims that are paraphrased, distorted, or fabricated. Readers can't easily trace claims back to primary sources to verify accuracy.
Chase the Source solves this by automatically finding and comparing original sources against claims.
flowchart LR
A[User pastes text<br>with a factual claim] --> B[System finds<br>original sources] --> C[System shows how<br>sources relate to claim]
| Result | Meaning |
|---|---|
| π’ Direct | Source states the claim verbatim or near-verbatim |
| π‘ Paraphrase | Source conveys the same meaning in different words |
| π΄ Contradiction | Source states the opposite or conflicts with the claim |
| βͺ Not Found | No sources found that address this specific claim |
Input:
Elon's company absolutely crushed it last year, delivering a whopping 1.8 million vehicles. The stock is going to the moon!
Output:
EXTRACTED CLAIM: Tesla delivered 1.8 million vehicles in 2023.
ATTRIBUTION: π’ DIRECT
SUMMARY: Tesla's official Q4 2023 investor report directly states
delivery of approximately 1.81 million vehicles in 2023.
BEST SOURCE: Tesla Q4 2023 Update (primary)
URL: https://ir.tesla.com/press-release/tesla-q4-2023-update
QUOTE: "In 2023, we produced 1.85 million vehicles and delivered
over 1.8 million vehicles."
- Python 3.11+
- OpenAI API key (get one here)
- Tavily API key (get one here - free tier available)
# Clone the repository git clone <repository-url> cd chase_source # Create virtual environment python3.11 -m venv venv source venv/bin/activate # Linux/macOS # OR: venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # Configure API keys cp .env.example .env # Edit .env and add your API keys
python app.py
Open http://localhost:7860 in your browser.
# With docker-compose (recommended) docker compose up # Or build manually docker build -t chase-the-source . docker run -p 7860:7860 --env-file .env chase-the-source
flowchart LR
subgraph LangGraph Workflow
direction LR
CE[Claim Extractor] --> SR[(Source Retriever)]
SR --> EF[Evidence Filter]
EF --> SC[Source Comparer]
SC --> AA[[Attribution Assembler]]
CE -->|no claim found| AA
end
| Stage | Purpose | Technology |
|---|---|---|
| Claim Extractor | Extract factual sub-claim from opinionated text | GPT-5-mini |
| Source Retriever | Find relevant web sources | Tavily Search API |
| Evidence Filter | Filter for relevance, extract quotes | GPT-5-mini |
| Source Comparer | Classify: direct/paraphrase/contradiction | GPT-5-mini |
| Attribution Assembler | Produce final result with best source | GPT-5-mini |
The system classifies sources by reliability:
| Type | Description | Examples |
|---|---|---|
| Primary | Original data or direct statements | SEC filings, press releases, official statistics, court documents |
| Original Reporting | First-party journalism | Interviews, investigations, on-scene reporting |
| Secondary | Aggregation or commentary | Wire rewrites, opinion pieces, blog posts |
Primary sources are prioritized when determining the "best source" for a claim.
Environment variables (.env):
# Required OPENAI_API_KEY=sk-your-key-here TAVILY_API_KEY=tvly-your-key-here # Optional OPENAI_MODEL=gpt-5-mini # LLM model to use TAVILY_MAX_RESULTS=10 # Max search results TAVILY_SEARCH_DEPTH=advanced # Search depth LOG_LEVEL=INFO # Logging verbosity GRADIO_SERVER_PORT=7860 # UI port
chase_source/
βββ app.py # Gradio UI
βββ graph.py # LangGraph workflow
βββ config.py # Settings
βββ nodes/
β βββ claim_extractor.py # Extract claims from text
β βββ source_retriever.py # Search for sources
β βββ evidence_filter.py # Filter relevant evidence
β βββ source_comparer.py # Classify source-claim relationship
β βββ attribution_assembler.py # Produce final result
βββ schemas/
β βββ models.py # Pydantic data models
βββ prompts/
β βββ templates.py # LLM prompts
βββ tests/ # Test suite
βββ docs/ # Detailed documentation
β βββ SETUP_GUIDE.md
β βββ DATA_SCHEMAS.md
β βββ PROMPTS.md
β βββ TECHNICAL_SPEC.md
β βββ TESTING_STRATEGY.md
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
# Run all unit tests pytest -m unit # Run with coverage pytest --cov=nodes --cov=schemas --cov-report=html # Run specific test pytest tests/test_claim_extractor.py -v
- Single claim: Extracts and traces one claim at a time
- No persistence: Ephemeral, single-session (no saved history)
- Search dependent: Quality depends on Tavily search results
- Temporal gaps: May find outdated sources for recent claims
- Language: English only
| Document | Contents |
|---|---|
| SETUP_GUIDE.md | Full installation instructions, Docker setup |
| DATA_SCHEMAS.md | Pydantic models and type definitions |
| PROMPTS.md | LLM prompts with few-shot examples |
| TECHNICAL_SPEC.md | Implementation details for each node |
| TESTING_STRATEGY.md | Test fixtures and mocking patterns |
- LangGraph - Workflow orchestration
- OpenAI API - LLM reasoning (GPT-5-mini)
- Tavily - Web search
- Gradio - Web UI
- Pydantic - Data validation
MIT