Name	Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/oc_repl	src/oc_repl
.gitignore	.gitignore
AGENTS.md	AGENTS.md
CLAUDE.md	CLAUDE.md
LICENSE	LICENSE
README.md	README.md
pyproject.toml	pyproject.toml

oc-repl

一个体感接近 Codex / Claude Code 的交互式 REPL,用来跑学术界主流的 terminal-agent 模型,专门为 手动试用 + 录制 demo 设计(不追求那两个产品的极致体验,覆盖常用协议就够)。

支持 4 套 on-the-wire 协议:

`--protocol`	来源	模型一次性输出
`camel-terminal-toolkit`(默认)	camel-ai `ChatAgent` + `TerminalToolkit` —— `HansBug/OpenClaw-RL` ckpt 的实际训练分布	OpenAI `tool_calls`,调 camel 的 4 个函数(`shell_exec(id, command, block, timeout)` / `shell_view(id)` / `shell_write_to_process(id, command)` / `shell_write_content_to_file(content, file_path)`);system prompt 是训练时 `get_developer_agent_prompt(...)` 的整段拷贝,结尾追加 `/no_think`
`openai-tools`	手写的简化版(一个 `shell(command)` + 一个 `write_file(path, content)`)	结构化 OpenAI `tool_calls`,临时 demo 方便但跟训练分布对不上
`terminus-json`	`terminal_bench/agents/prompt-templates/terminus-json-plain.txt` —— `tb run --agent terminus-2` 评测时用	一个 JSON 对象/轮:`{analysis, plan, commands[].keystrokes, task_complete}`
`terminus-xml`	`terminal_bench/agents/prompt-templates/terminus-xml-plain.txt` —— 同上的 XML 变体	`<response><analysis/><plan/><commands><keystrokes>...</keystrokes></commands><task_complete>...</task_complete></response>`

为什么写这个工具:camel-ai 和 terminal-bench 都没有官方交互式 REPL——camel 是 Python API + Gradio Web UI,terminal-bench 是 tb run 一次性 CLI。我们想要一个 Codex 风格的 chat 界面去手动试用 + 录 demo HansBug/OpenClaw-RL 训出来的 qwen3-8b-rl-iter215,所以自己写了 oc-repl。任何能说上面 4 套协议之一的模型都能用。

这个 ckpt 的「母语」是哪个协议?

对 HansBug/OpenClaw-RL 训出来的所有 ckpt(qwen3-8b-rl-*),答案永远是 camel-terminal-toolkit——这才是 RL rollout 实际用到的 on-the-wire surface。证据:

run msp60ius(生成 iter215 的那次)的启动命令里没有 --terminal-agent-type,所以 generate.py:142 的 getattr(args, "terminal_agent_type", "camel_agent") 落到 "camel_agent"。
rollout_agent.py:97-106 只接受这一个 value,传别的直接 raise ValueError ——仓库根本没有 terminus 路径。
agent/camel_agent.py 继承 camel.agents.ChatAgent;remote/terminal_env.py:163-185 把 camel.toolkits.TerminalToolkit 的 4 个函数封装成 tool_schemas 喂给 rollout。
sglang 端用 tool_call_parser: qwen25(见 configs/rollout_qwen3.yaml)把模型的 <tool_call>{...}</tool_call> markup 还原成结构化 tool_calls。
训练 rollout log 里:<tool_call> 出现 628 次,terminus-2 的 JSON key "task_complete" 出现 0 次。

Terminus-2 只在评测时用(tb run --agent terminus-2),完全不进 RL rollout。所以 --protocol terminus-json 是「这个模型能被诱导着说的另一种协议」,不是「它训练时用的协议」。

主要特性

一行启动:oc-repl --sandbox docker:openclaw-fixperm,进 REPL。
4 套协议 + 单一 UI:--protocol {camel-terminal-toolkit, openai-tools, terminus-json, terminus-xml},渲染、sandbox、engine 完全共享。
Thinking 默认折叠为 thinking... 3.2s · 184 chars spinner。--show-thinking 或 /think 切换。
Tool call 用面板渲染:跟 Codex 类似——标题栏命令、缩进 stdout、绿底 exit 0 / 红底 exit N。Stdout/stderr 自动收紧到 ≤10 行 ×ばつ ≤180 字符 + ... (N more lines clipped) 提示,避免 noisy 命令刷屏。
oc-repl exec "<task>" 非交互模式,对标 codex exec。--json 输出机器可读结果。
可选 post-agent 评分钩子:--verify "<bash cmd>" 或 --verify-file path/to/check.sh ——agent 自报 done 之后跑评分脚本,用绿/红面板展示 ✓/✗ 结果。专门为「agent 自报 task_complete ≠ 任务真的做对了」这种情况设计的。
极简依赖:rich + 标准库。chat completions 走原始 SSE,不用 openai SDK,不依赖 async runtime。

安装

git clone https://github.com/HansBug/oc-repl
cd oc-repl
pip install -e .

Python ≥ 3.9,并且要么有 docker daemon 可访问(推荐 sandbox 模式),要么接受 --sandbox local 在 host shell 跑命令(危险,仅供开发)。

Quick start(对着 `qwen3-8b-rl-iter215`)

假设已经按 HansBug/OpenClaw-RL issue #13 起好了 sglang,以及对应 task 的长寿容器:

# 1. (一次性)拉起 sandbox 容器——跟 terminal-bench 用的镜像一致
docker run -d --name openclaw-fixperm -w /app tb__fix-permissions__client sleep infinity
# 2. 交互式 REPL
oc-repl \
 --api-base http://127.0.0.1:30000/v1 \
 --model qwen3-8b-rl-iter215 \
 --sandbox docker:openclaw-fixperm
# 默认 --protocol camel-terminal-toolkit 已经对齐训练分布,不需要显式指定
# 3. 或一次性 exec
oc-repl exec \
 --sandbox docker:openclaw-fixperm \
 "Fix /app/process_data.sh so it can run, then run it once."

交互画面大概长这样:

╭─ oc-repl ──────────────────────────────────────────────────────╮
│ model qwen3-8b-rl-iter215 │
│ endpoint http://127.0.0.1:30000/v1 │
│ sandbox docker:openclaw-fixperm │
│ protocol camel-terminal-toolkit │
│ thinking hidden (use /think to toggle) │
│ │
│ Type a task. /help for commands. Ctrl+D to exit. │
╰────────────────────────────────────────────────────────────────╯
 › Fix /app/process_data.sh so it can run, then run it.
running task ...
╭─ ▸ shell ────────────────────╮
│ chmod +x /app/process_data.sh │
│ exit 0 │
╰───────────────────────────────╯
╭─ ▸ shell ───────────────────╮
│ /app/process_data.sh │
│ Data processed successfully! │
│ exit 0 │
╰──────────────────────────────╯
 ✓ rounds=2 commands=2 task_complete=True

命令清单

Slash 指令(交互模式)

指令	作用
`/help`	列出指令
`/think`	切换 `<think>` 是否显示
`/reset`	清空对话历史(sandbox 容器状态不变)
`/quit`(或 `Ctrl-D`)	退出

命令行参数

--api-base URL OpenAI-compatible 端点(默认 http://127.0.0.1:30000/v1)
--api-key KEY 默认 sk-dummy(sglang 不校验)
--model NAME 服务名;默认 qwen3-8b-rl-iter215
--protocol P camel-terminal-toolkit(默认 —— 跟 OpenClaw-RL 训练分布对齐)
 | openai-tools | terminus-json | terminus-xml | auto
--sandbox SPEC local(默认 —— 危险,跑在 host shell 上)
 或 docker:CONTAINER(exec 进一个已经在跑的容器)
--show-thinking 流式打印 <think>,不再折叠为 spinner
--temperature 0.2 默认 0.2
--max-tokens 4096 默认 4096
--cmd-timeout 30 每条命令的 sandbox 超时(秒)
--verify "CMD" agent 跑完后在 sandbox 跑一段 bash 评分。退出 0 → ✓ verified;非零 → ✗ failed
--verify-file PATH 同上,但脚本从本地文件拷进 sandbox 再跑

省略的参数从环境变量取(OPENAI_API_BASE、OPENAI_API_KEY、OC_REPL_MODEL、OC_REPL_PROTOCOL、OC_REPL_SANDBOX)。

非交互模式(`exec`)

oc-repl exec [shared flags] [--json] "<task>"

跑一轮,打印 Codex 风格 trace,退出。退出码:

code	含义
`0`	`--verify[-file]` 通过;或没有 verify hook 且 agent 自报 `task_complete=true`
`2`	没有 verify hook 且 agent 没自报 `task_complete`
`3`	verify hook 跑了但失败(agent 自称完成、但客观检查打脸)

--json 把末尾换成 JSON dump:

{
 "instruction": "...",
 "rounds": 2,
 "commands": 4,
 "task_complete": true,
 "last_summary": "...",
 "verify": {
 "passed": true,
 "returncode": 0,
 "output_tail": "OK: /app/recovered/credentials.txt\nPASS — all checks ok"
 }
}

Verify 钩子示例

recover-obfuscated-files 任务要求把两个文件恢复到 /app/recovered/credentials.txt 和 /app/recovered/project_alpha.log,内容也得对。一个简单的 verify 脚本:

#!/usr/bin/env bash
set -e
errors=0
check() { [ -f "1ドル" ] && grep -qF "2ドル" "1ドル" || { echo "BAD: 1ドル"; errors=$((errors+1)); }; }
check /app/recovered/credentials.txt 'P0$$wOrd123!'
check /app/recovered/project_alpha.log 'Log entry 1: System initialized.'
[ "$errors" -eq 0 ] && { echo PASS; exit 0; } || { echo "FAIL ($errors)"; exit 1; }

oc-repl exec --sandbox docker:openclaw-recover \
 --verify-file check_recover.sh \
 "Decode each *.b64_content in /app/sensitive_data/ — basename and content are both base64. Restore them into /app/recovered/."

输出末尾要么:

╭─ ✓ verified ─────────────────────────────╮
│ bash check_recover.sh │
│ OK: /app/recovered/credentials.txt │
│ OK: /app/recovered/project_alpha.log │
│ PASS │
╰────────────────────────────────────────────╯
✓ rounds=2 commands=2 task_complete=True verify=✓

要么(agent 中途偷懒):

╭─ ✗ verification failed (exit 1) ─────────╮
│ bash check_recover.sh │
│ BAD: /app/recovered/credentials.txt │
│ FAIL (1) │
╰────────────────────────────────────────────╯
✓ rounds=1 commands=2 task_complete=True verify=✗ (exit 1)

第二种情况就是这个钩子存在的全部原因:agent 自报 task_complete=true 是它自己的话,verify 才告诉你世界真的变成了用户想要的样子没有。

协议细节

`camel-terminal-toolkit`(默认)

对齐 HansBug/OpenClaw-RL 训练时的 on-the-wire 协议:

4 个工具的 schema 从 camel-ai==0.2.90 的 FunctionTool.get_openai_tool_schema() 抓出来 hardcode,包括 ["boolean", "null"] union 类型 + "strict": true 等训练时模型见过的 JSON-schema 细节。
System prompt 是 terminal-rl/agent/camel_agent.py::get_developer_agent_prompt(...) 在 system='Linux (in Docker)', machine='x86_64', is_workforce=False, non_think_mode=True 配置下产出的整段,结尾带 /no_think。
/no_think 是关键:训练时 non_think_mode=True 是默认值(rollout_agent.py:62),模型 RL 阶段从没生成过 <think> 块;如果让 thinking 开着,tool-call adherence 会塌(亲测:recover 任务里模型写 1500+ token prose 然后 0 个 tool_call)。
Sandbox 映射:shell_exec 全保真;shell_write_content_to_file 用 here-doc 100% 保真;shell_view / shell_write_to_process 因为 oc-repl sandbox 是 stateless docker exec(不维护 per-id tmux session),降级为 stub 输出。

`openai-tools`(简化版)

只暴露两个工具:shell(command)、write_file(path, content)。比 camel-terminal-toolkit 简单,但跟训练分布不一致。临时 demo 想要"轻一点"的协议可以用,正式跟 qwen3-8b-rl-iter215 这个 ckpt 比对 adherence 要用 camel-terminal-toolkit。

`terminus-json` / `terminus-xml`

tb run --agent terminus-2 用的协议,模型一次性吐 {analysis, plan, commands[], task_complete} JSON(或 XML 变体)。oc-repl 这两个 protocol 解析模型输出,按 commands[].keystrokes 在 sandbox 跑命令。

commands[].keystrokes 是逐字发到 shell 的——但 C-c、C-d 这种 tmux 特殊键序在 oc-repl 的 docker-exec sandbox 里不会真的转发为信号,因为我们没维护活的 tmux session。需要那种保真度就回去用 tb run --agent terminus-2。

qwen3-8b-rl-iter215 没有 RL-trained 在 terminus-* 协议上,所以这两个 protocol 在这个 ckpt 上的 adherence 只是底座模型的 JSON / XML 模仿能力,跟训练分布无关。

Sandbox

local 在 host shell 上跑。默认值——危险。
 banner 会用黄色高亮 sandbox 行作为提醒。
docker:NAME exec 进一个**已经在跑**的容器。demo 推荐这种。
 容器得 user 自己 `docker run -d`,oc-repl 只 attach 不会 spawn。

docker: 模式自动检测当前 shell 是否在 docker group,不在就自动 wrap 每条命令为 sg docker -c '...'。

为啥要求容器预先起好?

让多轮 REPL 累积 sandbox 状态:agent 上一轮 cd 了的 dir、起的后台进程、写的临时文件,下一轮都还在。
跳过每条命令 1-3 秒的容器启动开销,录像时节奏更顺。
让用户自带镜像——tb__hello-world__client、ubuntu:22.04、项目自己的 dev env 都行,oc-repl 不假设镜像内容。

仓库结构

src/oc_repl/
├── cli.py argparse 入口,对外暴露 `oc-repl` 命令
├── client.py 原始 SSE 实现的流式 OpenAI-compatible chat 客户端(不依赖 SDK)
├── engine.py REPL + exec 模式共享的 turn-execution 循环
├── oneshot.py `oc-repl exec` 实现
├── repl.py 交互模式(banner + `›` 提示 + slash 指令)
├── sandbox.py docker / local 两个 sandbox 后端
├── ui.py 基于 rich 的 banner / thinking-folder / tool-block / 提示符
└── protocols/
 ├── base.py Protocol + ParsedTurn + Command dataclass
 ├── camel_terminal_toolkit.py camel-ai 训练分布字节级复刻(默认)
 ├── openai_tools.py 简化版 1-tool 变体
 ├── terminus_json.py terminus-2 JSON
 └── terminus_xml.py terminus-2 XML

当前的边界

没有历史搜索、自动补全、/ 指令模糊匹配。这是个手动 probe + 录 demo 用的工具,不是 Codex / Claude Code 替代品。
没有真的 tmux 后端 —— terminus 的 keystrokes 在 oc-repl 里失去交互 TUI 语义(vim / less 不能玩)。
协议里不支持多 tool 之间的高级编排(除了协议本身定义的那些)。
terminus protocol 的 commands[].keystrokes 在 oc-repl 是按 bash 命令执行的,不进 tmux pipe。

License

MIT,详见 LICENSE。

Folders and files

Latest commit

History

Repository files navigation

oc-repl

这个 ckpt 的「母语」是哪个协议?

主要特性

安装

Quick start(对着 qwen3-8b-rl-iter215)

命令清单

Slash 指令(交互模式)

命令行参数

非交互模式(exec)

Verify 钩子示例

协议细节

camel-terminal-toolkit(默认)

openai-tools(简化版)

terminus-json / terminus-xml

Sandbox

仓库结构

当前的边界

License

相关项目

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Quick start(对着 `qwen3-8b-rl-iter215`)

非交互模式(`exec`)

`camel-terminal-toolkit`(默认)

`openai-tools`(简化版)

`terminus-json` / `terminus-xml`

Packages