Agent = Model + Harness. This is the harness.
Sleipnir is a minimal, opinionated framework for disciplined AI-assisted development with Claude Code. Four files, two agents, one non-negotiable principle: every change must pass a contract before it counts as done.
In Norse mythology, Sleipnir is Odin's eight-legged horse — the only steed that can ride through all nine realms, from Midgard to Hel and back. Three things make the name fit.
First, literally: Sleipnir is the horse that wears the harness. The name describes what the framework does.
Second, structurally: eight legs instead of four. A normal horse trades stability for speed, or speed for stability; Sleipnir has enough ground contact to get both at once. That is the promise of this framework — AI assistance that makes you faster without letting you drift.
Third, narratively: Sleipnir carries Odin between worlds. The harness carries the developer between the worlds of intent (spec.md), truth (contract.md), state (progress.md), and verdict (evaluation.md). The horse is not Odin; the horse is what makes Odin effective.
Odin sent two ravens across the nine realms each day and listened to what they brought back. Huginn ("Thought") and Muninn ("Memory"). Sleipnir uses both.
Huginn is the Implementer subagent. It reads the spec and contract, writes code, and updates progress. It never marks a task done.
Muninn is the Evaluator subagent. It holds the contracts in memory, runs the sensors (tests, lint, typecheck), and produces a verdict with file-and-line evidence. It never writes code.
There is an old line from the Eddas where Odin says he fears losing Muninn more than Huginn. Thought without memory becomes hallucination. That is the thesis of this framework.
Sleipnir runs on four markdown files per feature, plus an AGENTS.md at the repo root. Each one has a single owner and a single purpose.
spec.md holds the intent: what is being built, what is in scope, and the acceptance criteria. It is written by a human and reviewed by Muninn but never modified by Huginn.
contract.md holds the definition of "correct": verifiable rules, API shapes, schemas, invariants. Every assertion in this file must be checkable either by a sensor (test, linter, type checker) or by Muninn with evidence. This file is the framework's identity — without it, Sleipnir is just a wrapper for Claude Code.
progress.md holds the state of work in flight: current task, recently completed tasks, files touched this session, blockers, and the next step. Huginn updates it after every implementation run. This is what lets a fresh session pick up where the last one left off without re-learning the whole project.
evaluation.md holds Muninn's latest verdict: pass or fail per rubric dimension, each score backed by file-and-line evidence or command output. Overwritten every evaluation; previous verdicts live in git history.
You write
spec + contract
│
▼
┌──────────────┐
│ Huginn │ ◄─────────────┐
│ implements │ │
│ updates │ │
│ progress │ │
└──────┬───────┘ │
│ │
▼ │
┌──────────────┐ │
│ Muninn │ │
│ evaluates │ │
│ writes │ │
│ evaluation │ │
└──────┬───────┘ │
│ │
┌────┴────┐ │
│ PASS? │ │
└─┬─────┬─┘ │
│ no │ │
└─────┼─────────────────────┘
│ yes
▼
Task done.
Commit.
Sleipnir supports two modes, picked during init. The core loop is identical in both — what changes is how much ceremony each artifact carries.
Quick mode is for solo projects and maintenance-squad work. A task is a sentence. A spec is five lines. A contract is three to five assertions. Turnaround is minutes. Designed for the developer who already knows what to do and wants the contract-evaluator discipline without the process overhead.
Full mode is for teams that run sprint ceremonies, write user stories, care about CMMI alignment, or ship to stakeholders who read specs. User stories go in classic format, acceptance criteria in EARS notation, contracts as OpenAPI or JSON Schema where possible, and an extra planner step decomposes the spec into numbered tasks.
The rule that does not move: in both modes, contract.md exists and Muninn runs against it. Even three lines of contract is enough to turn "Claude made a change" into "the change passed a verifiable bar." That is the framework's identity and it does not negotiate with the mode.
⚠️ init.shis the bootstrapping tool. Other deliverables (subagents, slash commands, templates) land in subsequent steps; init.sh handles their absence gracefully.
Clone Sleipnir somewhere once:
git clone https://github.com/meccin/sleipnir.git ~/.sleipnirThen from any project root:
# Interactive — recommended for first-time use. # Five questions plus project metadata, then scaffolds. ~/.sleipnir/init.sh # Non-interactive — accept all defaults (mode=quick, automation=semi-auto, # language=en, team=small) and let stack auto-detection do the rest. ~/.sleipnir/init.sh -y # Print help with all flags and examples. ~/.sleipnir/init.sh --help
The script auto-detects your stack from package.json, composer.json, pom.xml, or build.gradle, and offers it as the default in question 2.
sleipnir/
├── README.md # You are here
├── LICENSE # MIT
├── CHANGELOG.md # Versioned record of framework changes
├── init.sh # Bootstrap script (next delivery)
├── template/ # Files copied into a project by init.sh
│ ├── AGENTS.md # Template with placeholders
│ ├── .claude/
│ │ ├── settings.json # Hooks + permissions
│ │ ├── commands/ # Slash commands (harness-plan, -implement, -evaluate)
│ │ └── agents/ # Huginn and Muninn subagents
│ ├── specs/_example/ # Seed feature folder
│ ├── docs/adr/ # Architecture Decision Records
│ └── harness/ # config.yml (mode, stack, evaluator rubric)
├── modes/
│ ├── quick/ # Quick-mode template overrides
│ └── full/ # Full-mode template overrides
├── stacks/ # Per-stack sensor configurations
└── docs/
├── philosophy.md # The "why" behind every design choice
└── progression.md # Adoption path: manual → orchestrated
Sleipnir deliberately does not try to be:
A planner AI. Claude Code already has plan mode, and your brain is better at scoping than any model.
A replacement for your team's task board. Sleipnir's progress.md tracks the current work session, not the backlog. Your Jira, Linear, or Trello stays where it is.
A CI/CD tool. Sensors run locally and in pre-commit hooks; CI is your existing pipeline.
An AGENTS.md generator. Those should be written by humans, kept short, and pruned often. Auto-generated context files have been shown to reduce task success rates.
This repository is being built iteratively and in public. Each delivery is small enough to review in one sitting.
- Step 1 — Skeleton + README
- Step 2 —
init.shwith interactive questionnaire - Step 3 —
spec.md,contract.md,progress.md,evaluation.mdtemplates (Quick + Full) - Step 4 — Template
AGENTS.md - Step 5 — Huginn and Muninn subagents
- Step 6 — Slash commands (
/harness-plan,/harness-implement,/harness-evaluate) - Step 7 — Stack configurations (node-ts, react-playwright, custom; with package manager support for pnpm/npm/yarn/bun)
- Step 8 (optional) — Full-auto mode refinement once Stage 3 of the progression is reached
- Step 9 (optional) — Migration to a Claude Code plugin
MIT. See LICENSE.