Architecture decisions, quantifiable results, and lessons learned from building AI data pipelines and developer tools.
These case studies document the engineering journey behind three production systems — each built from scratch, running entirely on-premise, with no cloud LLM dependencies.
| # | Case Study | Domain | Key Result |
|---|---|---|---|
| 1 | Multi-Source Fact-Checking Pipeline | NLP / Information Verification | 83.6% production accuracy across 127K claims, 11 prompt iterations in 2 days |
| 2 | Zero-Framework Autonomous AI Agent | Agent Systems / Infrastructure | 3-device distributed architecture, 1,235 tests, 101s → 3s latency |
| 3 | Declarative Data Mart Automation | Data Engineering / BI | 47.4M rows, 90% Text2SQL accuracy with zero manual config |
Across all three projects, several engineering principles emerged:
- Local-first inference — All systems run on-premise (DGX Spark, consumer GPUs). No cloud API dependency, no per-token costs at scale.
- Measure, don't assume — The fact-checking pipeline's "98% accuracy" collapsed to 56.3% under ground truth. Every metric needs independent verification.
- Structural solutions over prompt tuning — The biggest accuracy gains came from architectural changes (claim 3-classification, router elimination, bipolarization engine), not prompt engineering.
- Honest failure documentation — Each study includes what went wrong, not just what worked.
- QuartzUnit — 10 open-source Python packages extracted from these projects
- ArkNill — Author profile
Content in this repository is shared for portfolio and educational purposes.