Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

walkinglabs/learn-harness-engineering

Repository files navigation

English ็ฎ€ไฝ“ไธญๆ–‡ ็น้ซ”ไธญๆ–‡ ๆ—ฅๆœฌ่ชž ํ•œ๊ตญ์–ด Espaรฑol Franรงais ะ ัƒััะบะธะน Deutsch ุงู„ุนุฑุจูŠุฉ Tiแบฟng Viแป‡t Oสปzbekcha Tรผrkรงe Portuguรชs-BR

Learn Harness Engineering

A project-based course on building the environment, state management, verification, and control mechanisms that make AI coding agents work reliably.

12 Lectures 6 Projects 14 Languages MIT License

๐ŸŒ This course is available in 14 languages: English, ็ฎ€ไฝ“ไธญๆ–‡, ็น้ซ”ไธญๆ–‡, ๆ—ฅๆœฌ่ชž, ํ•œ๊ตญ์–ด, Espaรฑol, Franรงais, ะ ัƒััะบะธะน, Deutsch, ุงู„ุนุฑุจูŠุฉ, Tiแบฟng Viแป‡t, Oสปzbekcha, Tรผrkรงe, Portuguese (BR). Choose your language from the badges above.

Learn Harness Engineering is a course dedicated to the engineering of AI coding agents. We have deeply studied and synthesized the most advanced Harness Engineering theories and practices in the industry. Our core references include:

Quick start? The skills/harness-creator/ skill can help you scaffold a production-grade harness (AGENTS.md, feature lists, init.sh, verification workflows) for your own project in minutes.


Table of Contents


โœจ Visual Preview

๐Ÿ  Course Homepage

A comprehensive course outline and introduction to core philosophies, providing a clear path to get started.

Course homepage preview

๐Ÿ“– Immersive Lectures

Deep dives into real-world pain points and hands-on projects (like Project 01) for an immersive learning experience.

Course lecture preview

๐Ÿ—‚๏ธ Ready-to-Use Resource Library

Templates and reference configurations designed to solve common pitfalls in multi-turn AI agent development, such as context loss and premature task completion.

Resource library preview

PDF Coursebooks

The repository now includes a PDF build pipeline for the course content.

  • Run npm run pdf:build to generate the currently configured PDF coursebooks locally.
  • Output files are written to artifacts/pdfs/.
  • Run npm run screenshots:readme if you want to refresh the README preview images.
  • GitHub Actions workflow release-course-pdfs.yml can build the PDFs and publish them to GitHub Releases.

The Model Is Smart, The Harness Makes It Reliable

There's a hard truth most people learn the hard way: the strongest model in the world will still fail on real engineering tasks if you don't build a proper environment around it.

You've probably seen this yourself. You give Claude or GPT a task in your repo. It starts well โ€” reads files, writes code, looks productive. Then something goes wrong. It skips a step. It breaks a test. It says "done" but nothing actually works. You spend more time cleaning up than if you'd done it yourself.

This isn't a model problem. It's a harness problem.

The evidence is clear. Anthropic ran a controlled experiment: same model (Opus 4.5), same prompt ("build a 2D retro game editor"). Without a harness, it spent 9ใƒ‰ใƒซ in 20 minutes and produced something that didn't work. With a full harness (planner + generator + evaluator), it spent 200ใƒ‰ใƒซ in 6 hours and built a game you could actually play. The model didn't change. The harness did.

OpenAI reported the same thing with Codex: in a well-harnessed repository, the same model goes from "unreliable" to "reliable." Not a marginal improvement โ€” a qualitative shift.

This course teaches you how to build that environment.

 THE HARNESS PATTERN
 ====================
 You --> give task --> Agent reads harness files --> Agent executes
 |
 harness governs every step:
 |
 +--> Instructions: what to do, in what order
 +--> Scope: one feature at a time, no overreach
 +--> State: progress log, feature list, git history
 +--> Verification: tests, lint, type-check, smoke runs
 +--> Lifecycle: init at start, clean state at end
 |
 v
 Agent stops only when
 verification passes

What Harness Engineering Actually Means

Harness engineering is about building a complete working environment around the model so it produces reliable results. It's not about writing better prompts. It's about designing the system the model operates inside.

A harness has five subsystems:

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ THE HARNESS โ”‚
 โ”‚ โ”‚
 โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
 โ”‚ โ”‚ Instructions โ”‚ โ”‚ State โ”‚ โ”‚ Verification โ”‚ โ”‚
 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
 โ”‚ โ”‚ AGENTS.md โ”‚ โ”‚ progress.md โ”‚ โ”‚ tests + lint โ”‚ โ”‚
 โ”‚ โ”‚ CLAUDE.md โ”‚ โ”‚ feature_list โ”‚ โ”‚ type-check โ”‚ โ”‚
 โ”‚ โ”‚ feature_list โ”‚ โ”‚ git log โ”‚ โ”‚ smoke runs โ”‚ โ”‚
 โ”‚ โ”‚ docs/ โ”‚ โ”‚ session hand โ”‚ โ”‚ e2e pipeline โ”‚ โ”‚
 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
 โ”‚ โ”‚
 โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
 โ”‚ โ”‚ Scope โ”‚ โ”‚ Session Lifecycle โ”‚ โ”‚
 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
 โ”‚ โ”‚ one feature โ”‚ โ”‚ init.sh at start โ”‚ โ”‚
 โ”‚ โ”‚ at a time โ”‚ โ”‚ clean-state checklist at end โ”‚ โ”‚
 โ”‚ โ”‚ definition โ”‚ โ”‚ handoff note for next session โ”‚ โ”‚
 โ”‚ โ”‚ of done โ”‚ โ”‚ commit only when safe to resume โ”‚ โ”‚
 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
 โ”‚ โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 The MODEL decides what code to write.
 The HARNESS governs when, where, and how it writes it.
 The harness doesn't make the model smarter.
 It makes the model's output reliable.

Each subsystem has one job:

  • Instructions โ€” Tell the agent what to do, in what order, and what to read before starting. Not one giant file; a progressive disclosure structure the agent navigates on demand.
  • State โ€” Track what's been done, what's in progress, and what's next. Persisted to disk so the next session picks up exactly where the last one left off.
  • Verification โ€” Only a passing test suite counts as evidence. The agent cannot declare victory without runnable proof.
  • Scope โ€” Constrain the agent to one feature at a time. No overreach. No half-finishing three things. No rewriting the feature list to hide unfinished work.
  • Session Lifecycle โ€” Initialize at the start. Clean up at the end. Leave a clean restart path for the next session.

Why This Course Exists

The question isn't "can models write code?" They can. The question is: can they reliably complete real engineering tasks inside real repositories, over multiple sessions, without constant human supervision?

Right now, the answer is: not without a harness.

 WITHOUT HARNESS WITH HARNESS
 ============== ============
 Session 1: agent writes code Session 1: agent reads instructions
 agent breaks tests agent runs init.sh
 agent says "done" agent works on one feature
 you fix it manually agent verifies before claiming done
 agent updates progress log
 Session 2: agent starts fresh agent commits clean state
 agent has no memory
 of what happened before Session 2: agent reads progress log
 agent re-does work agent picks up exactly where it left off
 or does something else entirely agent continues the unfinished feature
 you fix it again you review, not rescue
 Result: you spend more time Result: agent does the work,
 cleaning up than if you you verify the result
 did it yourself

The questions this course actually cares about:

  • Which harness designs improve task completion rates?
  • Which designs reduce rework and incorrect completions?
  • Which mechanisms keep long-running tasks progressing steadily?
  • Which structures keep the system maintainable after multiple agent runs?

Course Curriculum & Documentation

For the full course materials, please visit the Documentation Website .

The curriculum is divided into three parts:

  1. Lectures: 12 conceptual units explaining the theory behind harness engineering.
  2. Projects: 6 hands-on projects where you build an agentic workspace from scratch.
  3. Resource Library: Copy-ready templates (AGENTS.md, feature_list.json, init.sh, etc.) to use in your own repositories today.

Quick Start: Improve Your Agent Today

You don't need to read all 12 lectures before you start getting value. If you're already using a coding agent on a real project, here's how to improve it right now.

The idea is simple: instead of just writing prompts, give your agent a set of structured files that define what to do, what's been done, and how to verify the work. These files live inside your repo, so every session starts from the same state.

 YOUR PROJECT ROOT
 โ”œโ”€โ”€ AGENTS.md <-- the agent's operating manual
 โ”œโ”€โ”€ CLAUDE.md <-- (alternative, if using Claude Code)
 โ”œโ”€โ”€ init.sh <-- runs install + verify + start
 โ”œโ”€โ”€ feature_list.json <-- what features exist, which are done
 โ”œโ”€โ”€ claude-progress.md <-- what happened each session
 โ””โ”€โ”€ src/ <-- your actual code

Grab the starter templates from the Resource Library and drop them into your project. That's it. Four files, and your agent sessions will already be significantly more stable than running on prompts alone.


Capstone Project: A Real App

All six course projects revolve around the same product: an Electron-based personal knowledge base desktop app.

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ Knowledge Base Desktop App โ”‚
 โ”‚ โ”‚
 โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
 โ”‚ โ”‚ Document Listโ”‚ โ”‚ Q&A Panel โ”‚ โ”‚
 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
 โ”‚ โ”‚ doc-001.md โ”‚ โ”‚ Q: What is harness eng? โ”‚ โ”‚
 โ”‚ โ”‚ doc-002.md โ”‚ โ”‚ A: The environment built โ”‚ โ”‚
 โ”‚ โ”‚ doc-003.md โ”‚ โ”‚ around an agent model... โ”‚ โ”‚
 โ”‚ โ”‚ ... โ”‚ โ”‚ [citation: doc-002.md] โ”‚ โ”‚
 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
 โ”‚ โ”‚
 โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
 โ”‚ โ”‚ Status Bar: 42 docs | 38 indexed | last sync 3m โ”‚ โ”‚
 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 Core features:
 โ”œโ”€โ”€ Import local documents
 โ”œโ”€โ”€ Manage a document library
 โ”œโ”€โ”€ Process and index documents
 โ”œโ”€โ”€ Run AI-powered Q&A over imported content
 โ””โ”€โ”€ Return grounded answers with citations

This project was chosen because it combines strong practical value, enough real-world product complexity, and a good setting for observing before/after harness improvements.

Each course project's starter/solution is a complete copy of this Electron app at that evolutionary stage. P(N+1)'s starter is derived from P(N)'s solution โ€” the app evolves as your harness skills grow.


Learning Path

The course is designed to be done in order. Each phase builds on the last.

 Phase 1: SEE THE PROBLEM Phase 2: STRUCTURE THE REPO
 ======================== ==========================
 L01 Strong models โ‰  reliable L03 Repository as single
 execution source of truth
 L02 What harness actually means
 L04 Split instructions across
 | files, not one giant file
 v
 P01 Prompt-only vs. |
 rules-first comparison v
 P02 Agent-readable workspace
 Phase 3: CONNECT SESSIONS Phase 4: FEEDBACK & SCOPE
 ========================== =========================
 L05 Keep context alive L07 Draw clear task boundaries
 across sessions
 L08 Feature lists as harness
 L06 Initialize before every primitives
 agent session
 |
 | v
 v P04 Runtime feedback to
 P03 Multi-session continuity correct agent behavior
 Phase 5: VERIFICATION Phase 6: PUT IT ALL TOGETHER
 ===================== ============================
 L09 Stop agents from L11 Make agent's runtime
 declaring victory early observable
 L10 Full-pipeline run = L12 Clean handoff at end of
 real verification every session
 | |
 v v
 P05 Agent verifies its own work P06 Build a complete harness
 (capstone project)

Each phase takes about a week if you're going part-time. If you want to go faster, phases 1โ€“3 can be done in a long weekend.


Syllabus

Lectures โ€” 12 conceptual units, each answering one core question

Read the full text for each lecture on the Documentation Website.

Session Question Core Idea
L01 Why do strong models still fail on real tasks? The capability gap between benchmarks and real engineering
L02 What does "harness" actually mean? Five subsystems: instructions, state, verification, scope, lifecycle
L03 Why must the repo be the single source of truth? If the agent can't see it, it doesn't exist
L04 Why does one giant instruction file fail? Progressive disclosure: give a map, not an encyclopedia
L05 Why do long-running tasks lose continuity? Persist progress to disk; pick up where you left off
L06 Why does initialization need its own phase? Verify the environment is healthy before the agent starts work
L07 Why do agents overreach and under-finish? One feature at a time; explicit definition of done
L08 Why are feature lists harness primitives? Machine-readable scope boundaries the agent can't ignore
L09 Why do agents declare victory too early? Verification gaps: confidence โ‰  correctness
L10 Why does end-to-end testing change results? Only a full-pipeline run counts as real verification
L11 Why does observability belong inside the harness? If you can't see what the agent did, you can't fix what it broke
L12 Why must every session leave a clean state? The next session's success depends on this session's cleanup

Projects โ€” 6 hands-on projects applying lecture methods to the same Electron app

Project What You Do Harness Mechanism
P01 Run the same task twice: prompt-only vs. rules-first Minimal harness: AGENTS.md + init.sh + feature_list.json
P02 Restructure the repo so the agent can read it Agent-readable workspace + persistent state files
P03 Make the agent pick up from where it left off Progress log + session handoff + multi-session continuity
P04 Stop the agent from doing too much or too little Runtime feedback + scope control + incremental indexing
P05 Make the agent verify its own work Self-verification + grounded Q&A + evidence-based completion
P06 Build a complete harness from scratch (capstone) Full harness: all mechanisms + observability + ablation study
 PROJECT EVOLUTION
 =================
 P01 Prompt-only vs. rules-first You see the problem
 |
 v
 P02 Agent-readable workspace You restructure the repo
 |
 v
 P03 Multi-session continuity You connect sessions
 |
 v
 P04 Runtime feedback & scope You add feedback loops
 |
 v
 P05 Self-verification You make the agent check itself
 |
 v
 P06 Complete harness (capstone) You build the full system
 Each project's solution becomes the next project's starter.
 The app evolves. Your harness skills grow with it.

Resource Library

  • English โ€” templates, checklists, and method references
  • ็ฎ€ไฝ“ไธญๆ–‡ โ€” ไธญๆ–‡ๆจกๆฟใ€ๆธ…ๅ•ๅ’Œๆ–นๆณ•ๅ‚่€ƒ
  • ็น้ซ”ไธญๆ–‡ โ€” ็น้ซ”ไธญๆ–‡็ฏ„ๆœฌใ€ๆธ…ๅ–ฎๅ’Œๆ–นๆณ•ๅƒ่€ƒ
  • ๆ—ฅๆœฌ่ชž โ€” ใƒ†ใƒณใƒ—ใƒฌใƒผใƒˆใ€ใƒใ‚งใƒƒใ‚ฏใƒชใ‚นใƒˆใ€ๆ–นๆณ•ใƒชใƒ•ใ‚กใƒฌใƒณใ‚น
  • ํ•œ๊ตญ์–ด โ€” ํ…œํ”Œ๋ฆฟ, ์ฒดํฌ๋ฆฌ์ŠคํŠธ, ๋ฐฉ๋ฒ• ์ฐธ๊ณ  ์ž๋ฃŒ
  • Espaรฑol โ€” plantillas, listas de verificaciรณn y referencias
  • Franรงais โ€” modรจles, listes de contrรดle et rรฉfรฉrences
  • ะ ัƒััะบะธะน โ€” ัˆะฐะฑะปะพะฝั‹, ั‡ะตะบ-ะปะธัั‚ั‹ ะธ ัะฟั€ะฐะฒะพั‡ะฝะธะบะธ
  • Deutsch โ€” Vorlagen, Checklisten und Referenzen
  • ุงู„ุนุฑุจูŠุฉ โ€” ู‚ูˆุงู„ุจุŒ ู‚ูˆุงุฆู… ุชุญู‚ู‚ ูˆู…ุฑุงุฌุน
  • Tiแบฟng Viแป‡t โ€” mแบซu, danh sรกch kiแปƒm tra vร  tร i liแป‡u tham khแบฃo
  • Oสปzbekcha โ€” andozalar, tekshiruv roสปyxatlari va maสผlumotnomalar
  • Tรผrkรงe โ€” ลŸablonlar, kontrol listeleri ve referanslar
  • Portuguรชs (BR) โ€” modelos, listas de verificaรงรฃo e referรชncias de mรฉtodos

The Agent Session Lifecycle

One of the core ideas in this course: the agent's session should follow a structured lifecycle, not a free-for-all. Here's what that looks like:

 AGENT SESSION LIFECYCLE
 ======================
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ START โ”‚
 โ”‚ โ”‚
 โ”‚ 1. Agent reads AGENTS.md / CLAUDE.md โ”‚
 โ”‚ 2. Agent runs init.sh (install, verify, health check) โ”‚
 โ”‚ 3. Agent reads claude-progress.md (what happened last time) โ”‚
 โ”‚ 4. Agent reads feature_list.json (what's done, what's next) โ”‚
 โ”‚ 5. Agent checks git log (recent changes) โ”‚
 โ”‚ โ”‚
 โ”‚ SELECT โ”‚
 โ”‚ โ”‚
 โ”‚ 6. Agent picks exactly ONE unfinished feature โ”‚
 โ”‚ 7. Agent works only on that feature โ”‚
 โ”‚ โ”‚
 โ”‚ EXECUTE โ”‚
 โ”‚ โ”‚
 โ”‚ 8. Agent implements the feature โ”‚
 โ”‚ 9. Agent runs verification (tests, lint, type-check) โ”‚
 โ”‚ 10. If verification fails: fix and re-run โ”‚
 โ”‚ 11. If verification passes: record evidence โ”‚
 โ”‚ โ”‚
 โ”‚ WRAP UP โ”‚
 โ”‚ โ”‚
 โ”‚ 12. Agent updates claude-progress.md โ”‚
 โ”‚ 13. Agent updates feature_list.json โ”‚
 โ”‚ 14. Agent records what's still broken or unverified โ”‚
 โ”‚ 15. Agent commits (only when safe to resume) โ”‚
 โ”‚ 16. Agent leaves clean restart path for next session โ”‚
 โ”‚ โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 The harness governs every transition in this lifecycle.
 The model decides what code to write at each step.
 Without the harness, step 9 becomes "agent says it looks fine."
 With the harness, step 9 is "tests pass, lint is clean, types check."

Who This Is For

This course is for:

  • Engineers already using coding agents who want better stability and quality
  • Researchers or builders who want a systematic understanding of harness design
  • Tech leads who need to understand how environment design affects agent performance

This course is not for:

  • People looking for a zero-code AI introduction
  • People who only care about prompts and don't plan to build real implementations
  • Learners not prepared to let agents work inside real repositories

Requirements

This is a course where you actually run coding agents.

You need at least one of these tools:

  • Claude Code
  • Codex
  • Another IDE or CLI coding agent that supports file editing, command execution, and multi-step tasks

The course assumes you can:

  • Open a local repository
  • Allow the agent to edit files
  • Allow the agent to run commands
  • Inspect output and re-run tasks

If you don't have such a tool, you can still read the course content, but you won't be able to complete the projects as intended.


Local Preview

This repository uses VitePress as a documentation viewer.

npm install
npm run docs:dev # Dev server with hot reload
npm run docs:build # Production build
npm run docs:preview # Preview built site

Then open the local URL that VitePress outputs in your browser.


Prerequisites

Required:

  • Familiarity with the terminal, git, and local development environments
  • Ability to read and write code in at least one common application stack
  • Basic software debugging experience (reading logs, tests, and runtime behavior)
  • Enough time to commit to implementation-focused coursework

Helpful but not required:

  • Experience with Electron, desktop apps, or local-first tools
  • Background in testing, logging, or software architecture
  • Prior exposure to Codex, Claude Code, or similar coding agents

Core References

Primary:

See the full layered reference list in docs/en/resources/reference/.


Repository Structure

learn-harness-engineering/
โ”œโ”€โ”€ docs/ # VitePress documentation site
โ”‚ โ”œโ”€โ”€ lectures/ # 12 lectures (index.md + code/ examples)
โ”‚ โ”‚ โ”œโ”€โ”€ lecture-01-*/
โ”‚ โ”‚ โ””โ”€โ”€ ... (12 total)
โ”‚ โ”œโ”€โ”€ projects/ # 6 project descriptions
โ”‚ โ”‚ โ”œโ”€โ”€ project-01-*/
โ”‚ โ”‚ โ””โ”€โ”€ ... (6 total)
โ”‚ โ””โ”€โ”€ resources/ # Multilingual templates & references (14 languages)
โ”‚ โ”œโ”€โ”€ en/
โ”‚ โ””โ”€โ”€ ... (14 total)
โ”œโ”€โ”€ projects/
โ”‚ โ”œโ”€โ”€ shared/ # Shared Electron + TypeScript + React foundation
โ”‚ โ””โ”€โ”€ project-NN/ # Per-project starter/ and solution/ directories
โ”œโ”€โ”€ skills/ # Reusable AI agent skills
โ”‚ โ””โ”€โ”€ harness-creator/ # Harness engineering skill
โ”œโ”€โ”€ package.json # VitePress + dev tooling
โ””โ”€โ”€ CLAUDE.md # Claude Code instructions for this repo

How the Course Is Organized

  • Each lecture focuses on one question
  • The course includes 6 projects
  • Every project requires the agent to do real work
  • Every project compares weak vs. strong harness results
  • What matters is the measured difference, not how many docs were written

Skills

This repository also includes reusable AI agent skills that you can install directly into your IDE or agent workspace.

  • harness-creator: A skill that helps you scaffold a production-grade harness for your own project in minutes.

Other Courses

Our team has also created other courses! Check them out:

Hands-on Modern RL

Hands-on Modern RL: An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

Modern LLM Notebook

Modern LLM Notebook: A hands-on course for building modern LLMs from scratch in PyTorch, with 23 runnable Jupyter Notebooks covering tokenizers, attention, MoE, RLHF, inference, evaluation, and distillation.


Star History

Star History Chart


Acknowledgments

This course was inspired by and draws ideas from learn-claude-code โ€” a progressive guide to building an agent from scratch, from a single loop to isolated autonomous execution.

AltStyle ใซใ‚ˆใฃใฆๅค‰ๆ›ใ•ใ‚ŒใŸใƒšใƒผใ‚ธ (->ใ‚ชใƒชใ‚ธใƒŠใƒซ) /