-
Notifications
You must be signed in to change notification settings - Fork 8
[draft] benchmark: Tor vs clearnet while mining — methodology (#256)#268
Closed
VijitSingh97 wants to merge 3 commits into
Closed
[draft] benchmark: Tor vs clearnet while mining — methodology (#256) #268VijitSingh97 wants to merge 3 commits into
VijitSingh97 wants to merge 3 commits into
Conversation
...sion rule (#256) Documents WHAT we measure and FROM WHERE before building the harness, so the run is reproducible and the conclusion is honest. Key points, grounded in what p2pool actually exposes on gouda (/stats/* + /api/state + docker stats): - Primary metric is PPLNS reward share (block_reward_share_percent) — the direct revenue number that already captures any orphan/uncle effect, so we do NOT need a raw orphan counter to answer "does Tor cost yield". Yield efficiency (reward-share ÷ hashrate-share) + average_effort back it up. - Raw uncle/orphan count is a secondary cross-check via p2pool console-log parse (not in /api/state) — honestly flagged as noisier, not the decision metric. - Secondary latency/connectivity metrics (peers, share cadence, rejects, Tor overhead) are all available today. - Method: sequential A/B on gouda + miner-0 at fixed hashrate, several days per arm, control variable = the #165 p2pool.clearnet knob (dogfoods the toggle). - Decision rule: clearnet arm sets the noise floor; keep Tor default if the delta is within it, else reconsider per-sidechain / clearnet-default-with-opt-in. Harness + results land in follow-up commits on this branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only collector for the Tor-vs-clearnet benchmark: snapshots p2pool's /stats/* (reward_share, effort, hashrate, shares, peers, sidechain diff) plus `docker stats tor` (CPU/mem) into one JSONL line per interval, per arm. No new container — it reads what p2pool already writes via --local-api/--data-api. Validated against the live p2pool on gouda (--once produced a clean snapshot: reward_share, peers_out/list, pool_hr, tor_cpu_pct/mem). shellcheck clean. Doc updated with the run recipe (nohup per arm) + a jq summary one-liner. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...net leak check (#256) Reads /proc/net/tcp from each app container and flags any ESTABLISHED connection to a PUBLIC IP (one that bypasses the Tor SOCKS at <bridge>.25:9050). For the "tor" arm every app container must show 0 public connections; only the tor container should reach the internet. For "clearnet" the mining-path containers should show direct public connections. Immediately caught a real leak on a live deploy: p2pool dialing sidechain peers on :37889 directly + Tari direct P2P — config-level checks had passed, but the connection-level check found the traffic was NOT actually behind Tor. This is the live egress assertion the gouda README flagged as a gap (#160/#2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 17, 2026
VijitSingh97
commented
Jun 18, 2026
Collaborator
Author
Superseded by #293 (feat/256-benchmark-harness), which carries the finalized methodology doc plus bench-collect.sh/bench-verify-egress.sh (brought over) and the full autonomous harness. All three files in this draft are in #293 in finalized form, and this branch is now conflicting. Closing as superseded — no unique content to salvage.
VijitSingh97
added a commit
that referenced
this pull request
Jun 28, 2026
...methodology (#293) * docs(#256): finalize the Tor-vs-clearnet benchmark plan + bring forward the collector Finalize the methodology against real numbers: full fleet miner-0..7 (~269 kH/s) on the `mini` sidechain (~1.8% of it → ~155 of our shares/day — representative + enough power; `nano` would make us ~11% and overstate the penalty, `main` is too sparse). Switch the design to an INTERLEAVED A/B (T/C/T/C in ~1.5-2d blocks, discard ~6h post switch) to control time-correlated variance, run to a target share count, and use the clearnet arm's own variance as the noise floor. Add the "autonomous on gouda" section + runbook: the run lives entirely on the box (detached orchestrator + cron watchdog + @reboot) so it survives the operator being offline for days; observe over Tailscale via `bench-status`. Scripts (orchestrate / healthcheck / status) land next, validated on gouda before the real run. Brings bench-collect.sh onto develop (was only on the #256 draft); supersedes #268. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(#256): autonomous benchmark harness (orchestrator + healthcheck + status) Three gouda-resident scripts so the Tor-vs-clearnet run survives the operator being offline for days: - bench-orchestrate.sh — interleaves the arms on a fixed block schedule (T/C/T/C), egress-gates each switch (rigorous multi-poll), keeps the collector running, health-checks each cycle, writes a status.json heartbeat, and self-heals (re-up stack + re-gate on BROKEN). All state in ~/pithead-bench, so a crash/reboot resumes from state. Subcommands: run | --calibrate | --watchdog | --status | --stop | --install-cron (the watchdog cron + @reboot restart it if it dies). - bench-healthcheck.sh — read-only checklist -> OK/WARN/BROKEN (stack, p2pool, hashrate, Tor-arm egress + firewall); reused by the loop + the cron. - bench-status.sh — one-screen summary for `ssh gouda bench-status` over Tailscale. shellcheck clean. To be validated on gouda (dry-run) before the real fleet run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): float-safe block/settle hours in the orchestrator (hours→secs via awk) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): healthcheck uses hashrate_15m (not 1h) so it doesn't false-WARN during the ramp Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): gate self-heals grandfathered Tari leak (restart) + bench-status missing-jsonl guard Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): per-cycle healthcheck egress matches gate rigor (3-poll/30s) to avoid false BROKEN on brief Tari dials Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): healthcheck keys on proxy_workers (reliable), not p2pool's noisy share-based hashrate estimate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): bench-status leads with 'stable for Xh' so quiet runs don't read as failing (old launch events linger in the tail) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): disable XvB for the benchmark — 'auto' optimizer was routing the fleet to XvB (target Whale), starving p2pool and confounding reward_share The XVB_DONATION_LEVEL=auto optimizer dynamically splits the fleet to climb raffle tiers — observed ~96kH/s to XvB vs ~18kH/s to p2pool while chasing the Whale tier. That starves p2pool (so reward_share, the primary metric, is measured on a fraction of the fleet) and makes the split a function of Tor latency — the exact thing we isolate. Disabling XvB sends the full ~269kH/s to p2pool in both arms. Re-asserted on every apply so a switch/recovery can't let it drift back on; --stop restores xvb.enabled=true. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#256): clearnet arm must toggle the #270 egress firewall OFF, else p2pool gets 0 peers set_arm flipped p2pool.clearnet but left the Tor-egress firewall (#270) ON, which DROPs direct clearnet dials from the container subnet. The clearnet arm would have collected zero-peer/zero-share garbage for 3 of 6 blocks — and since both runs so far only exercised block-1 TOR, it would have first bitten at the first switch (~20h in), unattended. Now the firewall tracks the arm: ON for tor (fail-closed, egress-gated), OFF for clearnet (p2pool peers over clearnet — the baseline). --stop restores firewall=true so a mid-clearnet-block stop never leaves gouda open. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#270): network.tor_egress_firewall=false never disabled the firewall (jq // false-coercion) `jq -r '.network.tor_egress_firewall // true'` returns true even when the key is explicitly false, because jq's // alternative treats false (not just null) as empty. So the documented #270 opt-out has never worked — the firewall was always on regardless of config. Same latent bug on `.xvb.tor // true` (xvb.tor=false silently stayed on Tor). Both now null-check explicitly so a configured false is honoured; absent still defaults on (fail-closed / Tor-by-default preserved). Surfaced by the #256 benchmark: the clearnet arm sets tor_egress_firewall=false so p2pool can peer over clearnet, but the firewall stayed up → 0 sidechain peers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(#294): extract config_bool helper + regression test for the jq false-coercion bug Replaces the two inline null-checks (TOR_EGRESS_FIREWALL, xvb.tor) with a shared config_bool() helper that honours an explicit false, and adds tests/stack unit coverage (explicit false/true, absent→default). The prior firewall test only exercised apply_tor_egress_firewall reading .env — it bypassed the config.json→.env jq that actually had the bug. Behavior-identical to d4b5df3 (which is what gouda's live benchmark runs); this is the cleaner, tested form for develop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(#256): formalize Tor-vs-clearnet results — Tor costs ~10% p2pool yield (latency, not rejects) The 5-day interleaved mini run completed (6 blocks, 3/arm). Result: Tor is -11.4% reward-share / -11.5% yield-efficiency vs clearnet, just above the ±8-10% block-to- block noise floor — so 'roughly 10%', directionally robust, not a tight estimate. Mechanism is pure propagation latency: 0 rejects in both arms, equal Tor-daemon CPU (~19%), equal peer counts; effort higher on Tor (113% vs 90%) = uncles/late shares. Recommendation: keep Tor the default (#165/#166) — ~10% is a modest, opt-out-able price for not leaking the home IP. Adds: - bench-analyze.py: committable analyzer (block = stat unit, settle excluded) - the raw run data under docs/benchmarks/data/ (reproduces the table exactly) - Results section in the methodology doc + the measured cost in privacy.md (#165/#166) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(#256): finalize Tor-vs-clearnet write-up + align the Tor-by-default story across docs - tor-vs-clearnet.md: flip the banner from 'results pending' to complete; reconcile the Method/runbook params with what was actually run (20h blocks / 3h settle / 6 blocks / ~5 days, not the planned 1.5-2d / 6h / 10-12d), and drop the non-existent --target-shares flag. Frame methodology + decision rule as pre-registration. - faq.md + architecture.md: #165/#166 were still described as 'planned for v1.1' / 'slated to move' and listed under 'what still touches clearnet' — now Tor-by-default as of v1.1 with the measured ~10% opt-out cost, consistent with privacy.md/configuration.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(#256): satisfy develop's ruff+shfmt+markdown lint on the benchmark files After merging develop (which added the Wave 7 lint tooling), CI applied ruff/shfmt/ markdownlint to the new benchmark files: - bench-analyze.py: drop unused median import, lambda->def, and use calendar.timegm instead of datetime.UTC so the documented 'python3 bench-analyze.py' reproduce command stays portable to pre-3.11 Python; ruff-format applied - bench-{collect,healthcheck,orchestrate,status}.sh + tests/stack/run.sh: shfmt -i 4 - tor-vs-clearnet.md: blank line around the Findings list (MD032) No behavior change. lint-sh/lint-py/lint-md + the pithead suite all pass; the analyzer reproduces the committed run's table byte-for-byte. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(#256): regenerate test-inventory for the config_bool regression test develop's CI gained a 'make test-inventory-check' gate; the new tests/stack config_bool case (#294 jq false-coercion guard) needs to be enumerated. No content change beyond the regenerated inventory. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Wave 2, PR A — the gate (draft: methodology first; harness + multi-day results to follow on this branch).
docs/benchmarks/tor-vs-clearnet.mddocuments exactly which metrics the benchmark uses and from where, before any data is collected — so the run is reproducible and the conclusion is honest. This is the gate for the Tor-default decision in #165/#166.The headline finding that shaped it
Raw orphan/uncle rate is not captured anywhere today (only
shares_found/ reject counts). But we don't need it: PPLNS reward share (block_reward_share_percent, in/stats/local/stratum) is the direct revenue number and already reflects any orphan/uncle effect — an orphaned share simply fails to earn its weight. So the bottom-line yield is measurable today; the raw uncle count is kept as a secondary log-parse cross-check, not the decision metric.Metrics & sources (all verified against the live p2pool on gouda)
block_reward_share_percent, yield efficiency (reward-share ÷ hashrate-share),average_effort— from/stats/local/stratum+/stats/pool/stats./stats/local/p2p), share cadence,shares_failed, hashrate windows, Tor overhead (docker stats tor).proxy_summary.{accepted,rejected,invalid}during donation windows.Method & decision rule
Sequential A/B on gouda + miner-0 at fixed hashrate, several days per arm, control variable = the #165
p2pool.clearnetknob (the benchmark dogfoods the toggle it validates). The clearnet arm sets the noise floor; keep Tor default if the delta is within it, else reconsider (per-sidechain default / clearnet-default-with-opt-in). Conclusion lands indocs/privacy.mdand sets the #165/#166 defaults before the v1.1 release.Next on this branch
/stats/*+docker stats, tails the p2pool console) — no new container.Related
#256 · gates #165 / #166 · part of the #160 privacy epic.
🤖 Generated with Claude Code