AI-native development quality gates for requirements, designs, tests, and code reviews.
A PRD says: requests at or above 500 USD require manager and finance approval.
Your tests pass:
120 USD → manager approval only ✓
600 USD → finance approval required ✓
Line coverage is green. But the test at exactly 500 USD is missing — and the implementation uses > 500 instead of >= 500, so the boundary case silently escapes to the wrong path.
Qualix calls this a semantic coverage gap. Q06 reports it even when coverage tools do not:
[HIGH] Missing boundary test for exactly 500 USD.
Implementation uses > 500; threshold rule says ≥ 500.
Add: approve(Request(amount=Decimal("500.00"))) → finance_required
Qualix turns product requirements into traceable engineering checks. Instead of stopping at line coverage, it follows requirement IDs through design coverage, test intent, generated unit tests, audit reports, and review findings.
Three production Java services run through the full Q01→Q06 pipeline:
| Project | PRD size | Q01: SE extracted | Q06: EUT audited | Weak / missing |
|---|---|---|---|---|
| Approval workflow | 67 requirements | 22 semantic expectations | 22 | 1 partial |
| Renewal service | 31 requirements | 11 semantic expectations | 90 | 0 |
| Platform service | 50 requirements | 18 semantic expectations | 103 | 16 partial, 2 missing |
In the platform service, 18 of 103 audited test targets had assertion gaps that line coverage did not flag.
For full details — what each finding looked like and why line coverage missed it — see docs/real-world-results.md.
Give Qualix a PRD, then move through small quality checkpoints:
- Turn messy requirements into explicit requirement items and business rules.
- Check whether a technical design really covers those requirements.
- Design tests around business behavior, not just lines of code.
- Audit generated tests and code review findings against the original intent.
The terms map to normal development work:
| Term | Plain meaning |
|---|---|
| Phase | One workflow step, such as structuring requirements or auditing tests |
| Gate | A pass/fail check before moving on |
| Judge / Critique | A second-pass review of the agent's own output |
| SE | A key semantic expectation the product depends on |
| EUT | A test target: behavior that should be proven by unit tests |
| RSM | A requirement-to-code/test trace map used to catch missing coverage |
You can learn the vocabulary gradually. For a first run, start with Q01 and inspect the generated requirement report.
| Problem | What Usually Happens | Qualix Approach |
|---|---|---|
| Requirement drift | PRDs lose detail as they move into design and code | Q01 extracts structured REQ/BR/SE items with traceable IDs |
| Design gaps | Technical designs are reviewed loosely | Q03/Q04 review design quality and requirement coverage |
| Shallow tests | Coverage is green but business behavior is not tested | Q05a/Q05b design and generate requirement-driven unit tests |
| Weak assertions | Tests assert calls or existence, not semantics | Q06 audits test intent, weak assertions, and coverage evidence |
| Review inconsistency | Code review depends on reviewer memory | Q07 produces structured, evidence-backed review findings |
Qualix is early and evolving. The repository is useful for experimentation, internal quality-gate workflows, and evaluating the phase model. APIs, file formats, and phase reports may still change before a stable 1.0.0 release.
pip install qualix qualix-run --profile python-service hello init qualix-run ingest examples/hello-prd.md --project hello qualix-run hello startup --json
Then run an AI-backed phase (requires a model API key):
export ANTHROPIC_API_KEY="..." # or OPENAI_API_KEY / GEMINI_API_KEY / DASHSCOPE_API_KEY qualix-run --profile python-service hello execute Q01 --json qualix-run hello finalize Q01 --json qualix-run hello approve Q01 --json
check collapses workspace init + project state creation + PRD ingest + the Q01→Q05a→Q06 semantic-coverage plan into one command. It does not run model reasoning; it prepares the workspace and returns the ordered phases for your AI coding agent to run:
qualix-run my-service check --prd docs/prd.md --code src/ --profile python-service --json
The JSON output includes profile_id, normalized code_repos, PRD ingest paths, project state_path, and a phase_plan with exact execute → finalize → approve commands. When --code is supplied, Q05a/Q06 commands include --code-repo automatically.
Phase reasoning runs inside your AI coding agent — check is the on-ramp, not a replacement for the agent loop.
▶ Watch on YouTube — tests pass, coverage is green, but the 500 USD boundary is missing.
The expense approval demo is the fastest way to see what Qualix catches. It ships with a synthetic PRD, a Python implementation with deliberate gaps, and tests that pass ordinary coverage — but miss the 500 USD boundary and idempotency rules.
Expected outputs are in examples/expense-approval/expected/: what Q01 should extract, what EUT matrix Q05a should produce, and what Q06 should report.
qualix-run --profile python-service expense-demo init qualix-run ingest examples/expense-approval/prd.md --project expense-demo qualix-run expense-demo startup --json
To try without private project data, start with the simpler examples/hello-prd.md.
See what Qualix output looks like without running anything (no API key needed):
qualix-run demo # built-in static demo qualix-run expense-demo run-demo --json # materialize Q01→Q05a→Q06 proof loop ./scripts/run_expense_demo.sh # full expense-approval walkthrough ./scripts/run_expense_demo.sh --live # live pipeline with your API key
run-demo writes precomputed Q01/Q05a/Q06 artifacts into the project output directory, builds the EvidenceGraph, and returns the proof signal as JSON:
{
"model_required": false,
"ordinary_tests": {"passed": true, "line_coverage_rate": 0.95},
"semantic_coverage": {"total_se": 5, "covered_se": 1, "missing_eut": 2},
"next_command": "qualix-run expense-demo explain SE-003 --json"
}That last command drills into the boundary finding through the EvidenceGraph:
qualix-run expense-demo explain SE-003 --json
It shows the chain from SE-003 to EUT-002, where Q06 marks the exact 500.00 USD boundary test as MISSING.
You can also run phases manually:
qualix-run my-project execute Q01 --json qualix-run my-project finalize Q01 --json qualix-run my-project approve Q01 --json
→ Qualix SE Extractor — paste a PRD, get a structured SE list in your browser. No installation required. Bring your own API key.
Qualix ships with instruction files for the three most common AI coding environments:
| File | Environment |
|---|---|
CLAUDE.md |
Claude Code (Anthropic) |
AGENTS.md |
Codex, opencode, and other OpenAI-compatible agents |
GEMINI.md |
Gemini CLI |
Inside any of these agents, load the project starter to begin:
$qualix-starter
The starter walks the agent through the project's phase model, current state, and the next recommended action — without requiring the agent to read all instruction files upfront.
This is a core design goal: Qualix is built to be used by an AI coding agent, not just reviewed by one.
GitHub Actions — use the composite action after your pipeline runs phases locally or in CI:
- uses: alexangelzhang/qualix@v0.2.0a1 with: project-id: my-project phase: Q06 fail-on: hard
pre-commit — gate on push after running the Q01–Q06 pipeline:
repos: - repo: https://github.com/alexangelzhang/qualix rev: v0.2.0a1 hooks: - id: qualix-gate args: [my-project, ci-gate, --fail-on, hard]
Both integrations read existing verdict files — no LLM calls at gate time.
Q01 Requirements Structuring
├── Q02 Technical Design Generation (optional)
│ └── Q03 Technical Design Quality Review
│ └── Q04 Technical Design Coverage Audit
│ └── Q07 Code Review
└── Q05a EUT Matrix Design
└── Q05b Unit Test Code Generation
└── Q06 Unit Test Coverage Audit
| Phase | Goal | Main Output |
|---|---|---|
| Q01 | Structure requirements | REQ/BR/SE/GAP/OPEN report and JSON |
| Q02 | Generate technical design | Implementation-ready design draft |
| Q03 | Review design quality | Architecture/API/data/error/performance findings |
| Q04 | Audit design coverage | Requirement-to-design coverage matrix |
| Q05a | Design executable unit-test targets | EUT matrix |
| Q05b | Generate unit-test code | Test code and execution notes |
| Q06 | Audit unit-test quality | Coverage and assertion-quality report |
| Q07 | Review code changes | Evidence-backed code review report |
Every phase follows the same lifecycle:
collect evidence -> execute skill -> write report + structured JSON -> self-check -> judge/critique -> finalize -> approve
New to the terms? Read Concepts for the short version of Phase, Gate, Harness, Judge, Critique, SE, EUT, and RSM.
For a deeper explanation of why phases exist, how SE extraction works, and what the Judge/Critique isolation achieves, see How It Works.
The root install.sh installs the Python package and copies runtime resources into a user-level Qualix directory. Development mode keeps those resources symlinked to this repository:
./install.sh --dev
For a lighter editable install:
python -m pip install -e '.[dev]'Optional extras:
python -m pip install -e '.[tree-sitter]' python -m pip install -e '.[feishu]' python -m pip install -e '.[vlm]' python -m pip install -e '.[deepeval]'
Tree-sitter adds file-local symbol extraction and parse diagnostics for Java, TypeScript, Go, and Python. Q01 document ingest works with local Markdown/text/html files today:
qualix-run ingest docs/prd.md --project my-project
Enterprise document URLs are handled through the provider-based ingest layer. DingTalk and Feishu/Lark URLs are recognized as optional integrations; if a connector is not configured, Qualix explains the missing setup instead of starting OAuth automatically. For a first run, export the document from your browser or use a local Markdown/text/html file.
For model-provider configuration, see Model Setup.
For ecosystem maturity, see Language Support. Java is the deepest path today; TypeScript, Go, and Python have built-in providers for detection and basic quality gates, plus optional Tree-sitter code intelligence.
Global commands:
qualix init qualix dashboard start qualix version
Project commands:
qualix-run <project_id> init qualix-run <project_id> check --prd <path> [--code <dir>] [--profile <p>] --json qualix-run <project_id> startup --json qualix-run <project_id> status --json qualix-run <project_id> execute <phase_id> --json qualix-run <project_id> finalize <phase_id> --json qualix-run <project_id> approve <phase_id> --json qualix-run <project_id> doctor
qualix/
├── src/qualix/ # Python package and CLI/runtime implementation
├── skills/ # Phase skills and workflow prompts
├── references/ # Report templates and risk catalogs
├── profiles/ # Language/domain profiles (versioned with @v1 suffix)
├── regression/ # Regression cases and failure-library examples
├── examples/ # Synthetic input examples (Python, TypeScript, Go)
├── benchmarks/ # Benchmark cases
│ ├── semantic-coverage/ # SC-001–SC-016 seed cases
│ ├── phase-failure-patterns/ # Per-phase failure pattern catalog
│ └── model-comparison/ # Multi-model Q06 audit comparison framework
├── vscode-extension/ # VS Code extension (preview): sidebar phase status
├── github-app/ # GitHub App (preview): zero-config PR gate comments
├── docs/ # User and architecture docs
│ └── sandbox/ # Online SE extractor (GitHub Pages)
├── tests/ # pytest suite
├── AGENTS.md # Codex/opencode instructions
├── CLAUDE.md # Claude Code instructions
├── GEMINI.md # Gemini CLI instructions
└── install.sh # Local installer
ruff check src/ tests/ pytest tests/ -q
For a narrower smoke test after install changes:
python -m pytest tests/test_version.py tests/test_install_sh.py -q python scripts/check_installed_wheel_smoke.py
The installed-wheel smoke builds the package, installs it into a temporary virtualenv outside the source tree, and verifies qualix-run <project_id> check --prd ... --json plus qualix-run expense-demo run-demo --json from the real wheel entry point.
The public repository should contain only synthetic or sanitized regression examples. Real enterprise failure libraries, customer requirements, and private review data should stay outside the public repo or be distributed under a separate commercial data license.
Public benchmark seeds live in benchmarks/semantic-coverage. They are small, synthetic cases for inspecting semantic coverage failures by hand.
Phase-specific failure patterns live in benchmarks/phase-failure-patterns. They connect each public failure-library case to the Qualix phase that failed and can be validated with python scripts/check_phase_failure_patterns.py.
Qualix is adjacent to AI PR reviewers, test-generation tools, and coding-agent workflows, but its core differentiator is requirement-semantic traceability rather than line coverage or generic diff review. See Comparison for details.
- Contributing guide: CONTRIBUTING.md
- Changelog: CHANGELOG.md
- Roadmap: ROADMAP.md
- Security policy: SECURITY.md
- How it works: docs/how-it-works.md
- Concepts: docs/concepts.md
- Model setup: docs/model-setup.md
- Language support: docs/language-support.md
- Comparison: docs/comparison.md
- Benchmark seed cases: benchmarks/semantic-coverage
- Phase failure patterns: benchmarks/phase-failure-patterns
- Synthetic starter input: examples/hello-prd.md
- Expense approval demo (Python): examples/expense-approval
- Rate limiter demo (TypeScript/Jest): examples/rate-limiter
- Order status demo (Go/testify): examples/order-status
Apache License 2.0. See LICENSE.
The core phases (Q01–Q07), CLI, all language profiles, all skill files, and all benchmark cases are Apache 2.0 and will remain so. Any future commercial offerings would focus on hosted services and team collaboration tooling — not on gating or restricting the open-source core.