Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(#256): autonomous Tor-vs-clearnet benchmark harness + finalized methodology#293

Merged
VijitSingh97 merged 17 commits into
develop from
feat/256-benchmark-harness
Jun 28, 2026
Merged

feat(#256): autonomous Tor-vs-clearnet benchmark harness + finalized methodology #293
VijitSingh97 merged 17 commits into
develop from
feat/256-benchmark-harness

Conversation

@VijitSingh97

@VijitSingh97 VijitSingh97 commented Jun 17, 2026
edited
Loading

Copy link
Copy Markdown
Collaborator

Supersedes the #268 draft. The gate for the final #165/#166 Tor-default decision.

What

  • docs/benchmarks/tor-vs-clearnet.md — finalized methodology: full fleet (miner-0..7, ~269 kH/s) on mini (~1.8% of the sidechain → ~155 of our shares/day — enough power, representative; nano would make us ~11% and overstate the penalty, main is too sparse), interleaved A/B (T/C/T/C blocks, settling discarded) to control time-correlated variance, run to a target share count, clearnet arm = noise floor. Plus the autonomous-on-gouda architecture + runbook.
  • bench-orchestrate.sh — autonomous driver (interleave + egress-gate + collector + per-cycle health + status.json heartbeat + self-heal). Runs detached on gouda; cron watchdog + @reboot restart it; resumes from state on crash/reboot. So the run survives the operator being offline for days.
  • bench-healthcheck.sh — read-only checklist → OK/WARN/BROKEN.
  • bench-status.sh — one-screen summary for ssh gouda bench-status over Tailscale.
  • bench-collect.sh — per-interval /stats snapshotter (brought from [draft] benchmark: Tor vs clearnet while mining — methodology (#256) #268 ).

The live run

Live on gouda since 2026年06月17日: --pool mini --block-hours 20 --blocks 6 --settle-hours 3 (~5 days, 3 blocks/arm, ≈done 2026年06月22日), XvB disabled so the full ~269 kH/s reaches p2pool in both arms — the auto optimizer otherwise routes most of the fleet into XvB (chasing the Whale tier) and confounds reward_share, the primary metric. Only the p2pool transport (and the #270 firewall) flips between arms; monerod + Tari stay on Tor throughout.

Two benchmark-blocking bugs surfaced + fixed (both validated live on gouda)

  1. set_arm now toggles the Enforce Tor-only egress at the network layer (fail-closed) — stop relying on per-app config #270 egress firewall with the arm — ON for tor (fail-closed, egress-gated), OFF for clearnet (else the firewall DROPs p2pool's clearnet dials → 0 sidechain peers → silent garbage).
  2. Real pithead bug (network.tor_egress_firewall=false never disables the #270 firewall (jq // false-coercion) #294 ): network.tor_egress_firewall=false never disabled the firewall — jq's // true coerces an explicit false back to true, so the documented Enforce Tor-only egress at the network layer (fail-closed) — stop relying on per-app config #270 opt-out was dead since it shipped. Same latent bug on .xvb.tor. Extracted into a config_bool() helper + regression-tested (tests/stack: 386 passed). Validated: clearnet arm went 0 → 54 peers; tor switch-back rigorous-egress all-clean.

Validated end-to-end on gouda

A fast 2-block dry run exercised the whole machine (arm switch via pithead apply, egress-gate=PASS/skip, per-arm collector, bench-status, RUN COMPLETE, clean self-restore). shellcheck clean.

Merge after the run completes (~2026年06月22日): the per-arm results + the final #165/#166 Tor-default recommendation land in docs/privacy.md in this PR before merge, so closing #256 delivers the actual decision (not just the harness).

Closes #256
Closes #294

🤖 Generated with Claude Code

VijitSingh97 and others added 12 commits June 17, 2026 11:09
...rd the collector
Finalize the methodology against real numbers: full fleet miner-0..7 (~269 kH/s) on
the `mini` sidechain (~1.8% of it → ~155 of our shares/day — representative + enough
power; `nano` would make us ~11% and overstate the penalty, `main` is too sparse).
Switch the design to an INTERLEAVED A/B (T/C/T/C in ~1.5-2d blocks, discard ~6h post
switch) to control time-correlated variance, run to a target share count, and use the
clearnet arm's own variance as the noise floor.
Add the "autonomous on gouda" section + runbook: the run lives entirely on the box
(detached orchestrator + cron watchdog + @reboot) so it survives the operator being
offline for days; observe over Tailscale via `bench-status`. Scripts (orchestrate /
healthcheck / status) land next, validated on gouda before the real run.
Brings bench-collect.sh onto develop (was only on the #256 draft); supersedes #268.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...+ status)
Three gouda-resident scripts so the Tor-vs-clearnet run survives the operator being
offline for days:
- bench-orchestrate.sh — interleaves the arms on a fixed block schedule (T/C/T/C),
 egress-gates each switch (rigorous multi-poll), keeps the collector running,
 health-checks each cycle, writes a status.json heartbeat, and self-heals (re-up
 stack + re-gate on BROKEN). All state in ~/pithead-bench, so a crash/reboot
 resumes from state. Subcommands: run | --calibrate | --watchdog | --status |
 --stop | --install-cron (the watchdog cron + @reboot restart it if it dies).
- bench-healthcheck.sh — read-only checklist -> OK/WARN/BROKEN (stack, p2pool,
 hashrate, Tor-arm egress + firewall); reused by the loop + the cron.
- bench-status.sh — one-screen summary for `ssh gouda bench-status` over Tailscale.
shellcheck clean. To be validated on gouda (dry-run) before the real fleet run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...ecs via awk)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...-WARN during the ramp
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...status missing-jsonl guard
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...s) to avoid false BROKEN on brief Tari dials
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
... noisy share-based hashrate estimate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...t read as failing (old launch events linger in the tail)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...ng the fleet to XvB (target Whale), starving p2pool and confounding reward_share
The XVB_DONATION_LEVEL=auto optimizer dynamically splits the fleet to climb
raffle tiers — observed ~96kH/s to XvB vs ~18kH/s to p2pool while chasing the
Whale tier. That starves p2pool (so reward_share, the primary metric, is
measured on a fraction of the fleet) and makes the split a function of Tor
latency — the exact thing we isolate. Disabling XvB sends the full ~269kH/s to
p2pool in both arms. Re-asserted on every apply so a switch/recovery can't let
it drift back on; --stop restores xvb.enabled=true.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...e p2pool gets 0 peers
set_arm flipped p2pool.clearnet but left the Tor-egress firewall (#270) ON, which
DROPs direct clearnet dials from the container subnet. The clearnet arm would have
collected zero-peer/zero-share garbage for 3 of 6 blocks — and since both runs so
far only exercised block-1 TOR, it would have first bitten at the first switch
(~20h in), unattended. Now the firewall tracks the arm: ON for tor (fail-closed,
egress-gated), OFF for clearnet (p2pool peers over clearnet — the baseline). --stop
restores firewall=true so a mid-clearnet-block stop never leaves gouda open.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...all (jq // false-coercion)
`jq -r '.network.tor_egress_firewall // true'` returns true even when the key is
explicitly false, because jq's // alternative treats false (not just null) as
empty. So the documented #270 opt-out has never worked — the firewall was always
on regardless of config. Same latent bug on `.xvb.tor // true` (xvb.tor=false
silently stayed on Tor). Both now null-check explicitly so a configured false is
honoured; absent still defaults on (fail-closed / Tor-by-default preserved).
Surfaced by the #256 benchmark: the clearnet arm sets tor_egress_firewall=false
so p2pool can peer over clearnet, but the firewall stayed up → 0 sidechain peers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...jq false-coercion bug
Replaces the two inline null-checks (TOR_EGRESS_FIREWALL, xvb.tor) with a shared
config_bool() helper that honours an explicit false, and adds tests/stack unit
coverage (explicit false/true, absent→default). The prior firewall test only
exercised apply_tor_egress_firewall reading .env — it bypassed the config.json→.env
jq that actually had the bug. Behavior-identical to d4b5df3 (which is what gouda's
live benchmark runs); this is the cleaner, tested form for develop.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VijitSingh97 and others added 5 commits June 23, 2026 15:44
... yield (latency, not rejects)
The 5-day interleaved mini run completed (6 blocks, 3/arm). Result: Tor is -11.4%
reward-share / -11.5% yield-efficiency vs clearnet, just above the ±8-10% block-to-
block noise floor — so 'roughly 10%', directionally robust, not a tight estimate.
Mechanism is pure propagation latency: 0 rejects in both arms, equal Tor-daemon CPU
(~19%), equal peer counts; effort higher on Tor (113% vs 90%) = uncles/late shares.
Recommendation: keep Tor the default (#165/#166) — ~10% is a modest, opt-out-able
price for not leaking the home IP. Adds:
- bench-analyze.py: committable analyzer (block = stat unit, settle excluded)
- the raw run data under docs/benchmarks/data/ (reproduces the table exactly)
- Results section in the methodology doc + the measured cost in privacy.md (#165/#166)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...ult story across docs
- tor-vs-clearnet.md: flip the banner from 'results pending' to complete; reconcile
 the Method/runbook params with what was actually run (20h blocks / 3h settle / 6
 blocks / ~5 days, not the planned 1.5-2d / 6h / 10-12d), and drop the non-existent
 --target-shares flag. Frame methodology + decision rule as pre-registration.
- faq.md + architecture.md: #165/#166 were still described as 'planned for v1.1' /
 'slated to move' and listed under 'what still touches clearnet' — now Tor-by-default
 as of v1.1 with the measured ~10% opt-out cost, consistent with privacy.md/configuration.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...ark files
After merging develop (which added the Wave 7 lint tooling), CI applied ruff/shfmt/
markdownlint to the new benchmark files:
- bench-analyze.py: drop unused median import, lambda->def, and use calendar.timegm
 instead of datetime.UTC so the documented 'python3 bench-analyze.py' reproduce
 command stays portable to pre-3.11 Python; ruff-format applied
- bench-{collect,healthcheck,orchestrate,status}.sh + tests/stack/run.sh: shfmt -i 4
- tor-vs-clearnet.md: blank line around the Findings list (MD032)
No behavior change. lint-sh/lint-py/lint-md + the pithead suite all pass; the analyzer
reproduces the committed run's table byte-for-byte.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...test
develop's CI gained a 'make test-inventory-check' gate; the new tests/stack config_bool
case (#294 jq false-coercion guard) needs to be enumerated. No content change beyond
the regenerated inventory.
@VijitSingh97 VijitSingh97 merged commit 59d089f into develop Jun 28, 2026
16 checks passed
@VijitSingh97 VijitSingh97 deleted the feat/256-benchmark-harness branch June 28, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

1 participant

AltStyle によって変換されたページ (->オリジナル) /