An agent-operable mechanistic-interpretability lab on the new Google Colab CLI. Replicate real interpretability papers with one command. Teach your coding agent to do mech-interp.
uv tool install google-colab-cli && colab status # auth once pip install git+https://github.com/OpenInterpretability/openinterp-lab oilab replicate tool-entropy # FREE, CPU, ~1 min -> REPLICATION PASS oilab replicate lever-is-late -y # rents an A100, replicates a causal-steering paper, tears down
The Colab CLI (June 2026) lets a terminal — or an AI agent — provision GPUs and run code on them.
openinterp-lab builds the research layer on top:
oilab replicate <paper>— one-command, auto-verified replication of published interpretability experiments (fetch notebook → provision GPU → execute → pull results JSON → compare against published numbers →PASS / DIVERGENTverdict → tear down).oilab run <notebook> --gpu A100— run your experiment with the hardened flow (proper timeouts, ephemeral-disk-safe result capture, token injection without echoing, auto-teardown).skills/openinterp-lab/SKILL.md— a skill file that teaches Claude Code / Codex / any agent to operate the whole stack: 5 research loops, the verified asset registry, and every operational gotcha we hit so your agent doesn't have to.
# give the skill to Claude Code: cp -r skills/openinterp-lab ~/.claude/skills/ # then just ask: "replicate the lever-is-late paper and explain what it shows"
| key | claim under test | hardware |
|---|---|---|
tool-entropy |
tool-use entropy collapse separates WANDERING agents (AUROC 0.887) — DOI 10.5281/zenodo.20368600 | CPU, free |
lever-is-late |
the termination decision of a 27B agent is causally writable only in a late action-commitment block — task-matched donor flips real generations 42%, p=0.031 — DOI 10.5281/zenodo.20534219 | Colab A100 |
commitment-lever |
(pre-registered, in flight) does that late lever generalize to a second committal action? | Colab A100 |
Replication divergence is a finding, not a failure — open an issue with your results.json.
- Replicate a paper (above).
- Locate & steer a decision lever on any open model —
decision-locator. - Probe with the causal step enforced — the report always answers predicts? AND controls? separately (the arc's core lesson: an AUROC-0.91 feature can be causally inert).
- SAE features on Qwen3.6-27B with the pretrained 11-layer full-stack SAE.
- Honest-research pipeline: PREREG → run → adversarial EVAL → Zenodo DOI. Nulls included.
It's extracted from a real research program — the WANDERING arc (6 papers + a tool, all open access with permanent DOIs, including the honest nulls and the corrected claims). The data is public (99 labeled SWE-bench Pro agent trajectories), the notebooks are public, and the wrapper exists because we lost a GPU run to every gotcha it now guards against.
Early external detection of agent WANDERING is unsolved — we tested 6 cheap methods (tool-entropy variants, repeated actions, information gain, reasoning-text signals, fused classifiers, an LLM judge) and none beat chance early. The labeled dataset is public. If you can detect WANDERING by turn 15 at <5% FP, that's a paper. Baselines to beat are in the SKILL.md.
Apache-2.0. Built by OpenInterpretability · @0xCVYH.