Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows	.github/workflows
oilab	oilab
skills/openinterp-lab	skills/openinterp-lab
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
pyproject.toml	pyproject.toml

openinterp-lab

An agent-operable mechanistic-interpretability lab on the new Google Colab CLI. Replicate real interpretability papers with one command. Teach your coding agent to do mech-interp.

uv tool install google-colab-cli && colab status # auth once
pip install git+https://github.com/OpenInterpretability/openinterp-lab
oilab replicate tool-entropy # FREE, CPU, ~1 min -> REPLICATION PASS
oilab replicate lever-is-late -y # rents an A100, replicates a causal-steering paper, tears down

What this is

The Colab CLI (June 2026) lets a terminal — or an AI agent — provision GPUs and run code on them. openinterp-lab builds the research layer on top:

oilab replicate <paper> — one-command, auto-verified replication of published interpretability experiments (fetch notebook → provision GPU → execute → pull results JSON → compare against published numbers → PASS / DIVERGENT verdict → tear down).
oilab run <notebook> --gpu A100 — run your experiment with the hardened flow (proper timeouts, ephemeral-disk-safe result capture, token injection without echoing, auto-teardown).
skills/openinterp-lab/SKILL.md — a skill file that teaches Claude Code / Codex / any agent to operate the whole stack: 5 research loops, the verified asset registry, and every operational gotcha we hit so your agent doesn't have to.

# give the skill to Claude Code:
cp -r skills/openinterp-lab ~/.claude/skills/
# then just ask: "replicate the lever-is-late paper and explain what it shows"

Replicable experiments

key	claim under test	hardware
`tool-entropy`	tool-use entropy collapse separates WANDERING agents (AUROC 0.887) — DOI 10.5281/zenodo.20368600	CPU, free
`lever-is-late`	the termination decision of a 27B agent is causally writable only in a late action-commitment block — task-matched donor flips real generations 42%, p=0.031 — DOI 10.5281/zenodo.20534219	Colab A100
`commitment-lever`	(pre-registered, in flight) does that late lever generalize to a second committal action?	Colab A100

Replication divergence is a finding, not a failure — open an issue with your results.json.

The 5 loops (see SKILL.md for full recipes)

Replicate a paper (above).
Locate & steer a decision lever on any open model — decision-locator.
Probe with the causal step enforced — the report always answers predicts? AND controls? separately (the arc's core lesson: an AUROC-0.91 feature can be causally inert).
SAE features on Qwen3.6-27B with the pretrained 11-layer full-stack SAE.
Honest-research pipeline: PREREG → run → adversarial EVAL → Zenodo DOI. Nulls included.

Why trust this

It's extracted from a real research program — the WANDERING arc (6 papers + a tool, all open access with permanent DOIs, including the honest nulls and the corrected claims). The data is public (99 labeled SWE-bench Pro agent trajectories), the notebooks are public, and the wrapper exists because we lost a GPU run to every gotcha it now guards against.

Open problem, free to a good home

Early external detection of agent WANDERING is unsolved — we tested 6 cheap methods (tool-entropy variants, repeated actions, information gain, reasoning-text signals, fused classifiers, an LLM judge) and none beat chance early. The labeled dataset is public. If you can detect WANDERING by turn 15 at <5% FP, that's a paper. Baselines to beat are in the SKILL.md.

License

Apache-2.0. Built by OpenInterpretability · @0xCVYH.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenInterpretability/openinterp-lab

Folders and files

Latest commit

History

Repository files navigation

openinterp-lab

What this is

Replicable experiments

The 5 loops (see SKILL.md for full recipes)

Why trust this

Open problem, free to a good home

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

openinterp-lab

What this is

Replicable experiments

The 5 loops (see SKILL.md for full recipes)

Why trust this

Open problem, free to a good home

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages