Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[draft] benchmark: Tor vs clearnet while mining — methodology (#256)#268

Closed
VijitSingh97 wants to merge 3 commits into
develop from
feat/256-benchmark
Closed

[draft] benchmark: Tor vs clearnet while mining — methodology (#256) #268
VijitSingh97 wants to merge 3 commits into
develop from
feat/256-benchmark

Conversation

@VijitSingh97

@VijitSingh97 VijitSingh97 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What & why

Wave 2, PR A — the gate (draft: methodology first; harness + multi-day results to follow on this branch).

docs/benchmarks/tor-vs-clearnet.md documents exactly which metrics the benchmark uses and from where, before any data is collected — so the run is reproducible and the conclusion is honest. This is the gate for the Tor-default decision in #165/#166.

The headline finding that shaped it

Raw orphan/uncle rate is not captured anywhere today (only shares_found / reject counts). But we don't need it: PPLNS reward share (block_reward_share_percent, in /stats/local/stratum) is the direct revenue number and already reflects any orphan/uncle effect — an orphaned share simply fails to earn its weight. So the bottom-line yield is measurable today; the raw uncle count is kept as a secondary log-parse cross-check, not the decision metric.

Metrics & sources (all verified against the live p2pool on gouda)

Method & decision rule

Sequential A/B on gouda + miner-0 at fixed hashrate, several days per arm, control variable = the #165 p2pool.clearnet knob (the benchmark dogfoods the toggle it validates). The clearnet arm sets the noise floor; keep Tor default if the delta is within it, else reconsider (per-sidechain default / clearnet-default-with-opt-in). Conclusion lands in docs/privacy.md and sets the #165/#166 defaults before the v1.1 release.

Next on this branch

  • A small read-only collector (snapshots /stats/* + docker stats, tails the p2pool console) — no new container.
  • The multi-day run, then the filled-in results table + recommendation.

Related

#256 · gates #165 / #166 · part of the #160 privacy epic.

🤖 Generated with Claude Code

VijitSingh97 and others added 3 commits June 16, 2026 20:46
...sion rule (#256)
Documents WHAT we measure and FROM WHERE before building the harness, so the run
is reproducible and the conclusion is honest. Key points, grounded in what
p2pool actually exposes on gouda (/stats/* + /api/state + docker stats):
- Primary metric is PPLNS reward share (block_reward_share_percent) — the direct
 revenue number that already captures any orphan/uncle effect, so we do NOT
 need a raw orphan counter to answer "does Tor cost yield". Yield efficiency
 (reward-share ÷ hashrate-share) + average_effort back it up.
- Raw uncle/orphan count is a secondary cross-check via p2pool console-log parse
 (not in /api/state) — honestly flagged as noisier, not the decision metric.
- Secondary latency/connectivity metrics (peers, share cadence, rejects, Tor
 overhead) are all available today.
- Method: sequential A/B on gouda + miner-0 at fixed hashrate, several days per
 arm, control variable = the #165 p2pool.clearnet knob (dogfoods the toggle).
- Decision rule: clearnet arm sets the noise floor; keep Tor default if the
 delta is within it, else reconsider per-sidechain / clearnet-default-with-opt-in.
Harness + results land in follow-up commits on this branch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only collector for the Tor-vs-clearnet benchmark: snapshots p2pool's
/stats/* (reward_share, effort, hashrate, shares, peers, sidechain diff) plus
`docker stats tor` (CPU/mem) into one JSONL line per interval, per arm. No new
container — it reads what p2pool already writes via --local-api/--data-api.
Validated against the live p2pool on gouda (--once produced a clean snapshot:
reward_share, peers_out/list, pool_hr, tor_cpu_pct/mem). shellcheck clean.
Doc updated with the run recipe (nohup per arm) + a jq summary one-liner.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
...net leak check (#256)
Reads /proc/net/tcp from each app container and flags any ESTABLISHED connection
to a PUBLIC IP (one that bypasses the Tor SOCKS at <bridge>.25:9050). For the
"tor" arm every app container must show 0 public connections; only the tor
container should reach the internet. For "clearnet" the mining-path containers
should show direct public connections.
Immediately caught a real leak on a live deploy: p2pool dialing sidechain peers
on :37889 directly + Tari direct P2P — config-level checks had passed, but the
connection-level check found the traffic was NOT actually behind Tor. This is
the live egress assertion the gouda README flagged as a gap (#160/#2).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator Author

Superseded by #293 (feat/256-benchmark-harness), which carries the finalized methodology doc plus bench-collect.sh/bench-verify-egress.sh (brought over) and the full autonomous harness. All three files in this draft are in #293 in finalized form, and this branch is now conflicting. Closing as superseded — no unique content to salvage.

VijitSingh97 added a commit that referenced this pull request Jun 28, 2026
...methodology (#293)
* docs(#256): finalize the Tor-vs-clearnet benchmark plan + bring forward the collector
Finalize the methodology against real numbers: full fleet miner-0..7 (~269 kH/s) on
the `mini` sidechain (~1.8% of it → ~155 of our shares/day — representative + enough
power; `nano` would make us ~11% and overstate the penalty, `main` is too sparse).
Switch the design to an INTERLEAVED A/B (T/C/T/C in ~1.5-2d blocks, discard ~6h post
switch) to control time-correlated variance, run to a target share count, and use the
clearnet arm's own variance as the noise floor.
Add the "autonomous on gouda" section + runbook: the run lives entirely on the box
(detached orchestrator + cron watchdog + @reboot) so it survives the operator being
offline for days; observe over Tailscale via `bench-status`. Scripts (orchestrate /
healthcheck / status) land next, validated on gouda before the real run.
Brings bench-collect.sh onto develop (was only on the #256 draft); supersedes #268.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(#256): autonomous benchmark harness (orchestrator + healthcheck + status)
Three gouda-resident scripts so the Tor-vs-clearnet run survives the operator being
offline for days:
- bench-orchestrate.sh — interleaves the arms on a fixed block schedule (T/C/T/C),
 egress-gates each switch (rigorous multi-poll), keeps the collector running,
 health-checks each cycle, writes a status.json heartbeat, and self-heals (re-up
 stack + re-gate on BROKEN). All state in ~/pithead-bench, so a crash/reboot
 resumes from state. Subcommands: run | --calibrate | --watchdog | --status |
 --stop | --install-cron (the watchdog cron + @reboot restart it if it dies).
- bench-healthcheck.sh — read-only checklist -> OK/WARN/BROKEN (stack, p2pool,
 hashrate, Tor-arm egress + firewall); reused by the loop + the cron.
- bench-status.sh — one-screen summary for `ssh gouda bench-status` over Tailscale.
shellcheck clean. To be validated on gouda (dry-run) before the real fleet run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): float-safe block/settle hours in the orchestrator (hours→secs via awk)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): healthcheck uses hashrate_15m (not 1h) so it doesn't false-WARN during the ramp
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): gate self-heals grandfathered Tari leak (restart) + bench-status missing-jsonl guard
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): per-cycle healthcheck egress matches gate rigor (3-poll/30s) to avoid false BROKEN on brief Tari dials
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): healthcheck keys on proxy_workers (reliable), not p2pool's noisy share-based hashrate estimate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): bench-status leads with 'stable for Xh' so quiet runs don't read as failing (old launch events linger in the tail)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): disable XvB for the benchmark — 'auto' optimizer was routing the fleet to XvB (target Whale), starving p2pool and confounding reward_share
The XVB_DONATION_LEVEL=auto optimizer dynamically splits the fleet to climb
raffle tiers — observed ~96kH/s to XvB vs ~18kH/s to p2pool while chasing the
Whale tier. That starves p2pool (so reward_share, the primary metric, is
measured on a fraction of the fleet) and makes the split a function of Tor
latency — the exact thing we isolate. Disabling XvB sends the full ~269kH/s to
p2pool in both arms. Re-asserted on every apply so a switch/recovery can't let
it drift back on; --stop restores xvb.enabled=true.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#256): clearnet arm must toggle the #270 egress firewall OFF, else p2pool gets 0 peers
set_arm flipped p2pool.clearnet but left the Tor-egress firewall (#270) ON, which
DROPs direct clearnet dials from the container subnet. The clearnet arm would have
collected zero-peer/zero-share garbage for 3 of 6 blocks — and since both runs so
far only exercised block-1 TOR, it would have first bitten at the first switch
(~20h in), unattended. Now the firewall tracks the arm: ON for tor (fail-closed,
egress-gated), OFF for clearnet (p2pool peers over clearnet — the baseline). --stop
restores firewall=true so a mid-clearnet-block stop never leaves gouda open.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(#270): network.tor_egress_firewall=false never disabled the firewall (jq // false-coercion)
`jq -r '.network.tor_egress_firewall // true'` returns true even when the key is
explicitly false, because jq's // alternative treats false (not just null) as
empty. So the documented #270 opt-out has never worked — the firewall was always
on regardless of config. Same latent bug on `.xvb.tor // true` (xvb.tor=false
silently stayed on Tor). Both now null-check explicitly so a configured false is
honoured; absent still defaults on (fail-closed / Tor-by-default preserved).
Surfaced by the #256 benchmark: the clearnet arm sets tor_egress_firewall=false
so p2pool can peer over clearnet, but the firewall stayed up → 0 sidechain peers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(#294): extract config_bool helper + regression test for the jq false-coercion bug
Replaces the two inline null-checks (TOR_EGRESS_FIREWALL, xvb.tor) with a shared
config_bool() helper that honours an explicit false, and adds tests/stack unit
coverage (explicit false/true, absent→default). The prior firewall test only
exercised apply_tor_egress_firewall reading .env — it bypassed the config.json→.env
jq that actually had the bug. Behavior-identical to d4b5df3 (which is what gouda's
live benchmark runs); this is the cleaner, tested form for develop.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(#256): formalize Tor-vs-clearnet results — Tor costs ~10% p2pool yield (latency, not rejects)
The 5-day interleaved mini run completed (6 blocks, 3/arm). Result: Tor is -11.4%
reward-share / -11.5% yield-efficiency vs clearnet, just above the ±8-10% block-to-
block noise floor — so 'roughly 10%', directionally robust, not a tight estimate.
Mechanism is pure propagation latency: 0 rejects in both arms, equal Tor-daemon CPU
(~19%), equal peer counts; effort higher on Tor (113% vs 90%) = uncles/late shares.
Recommendation: keep Tor the default (#165/#166) — ~10% is a modest, opt-out-able
price for not leaking the home IP. Adds:
- bench-analyze.py: committable analyzer (block = stat unit, settle excluded)
- the raw run data under docs/benchmarks/data/ (reproduces the table exactly)
- Results section in the methodology doc + the measured cost in privacy.md (#165/#166)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(#256): finalize Tor-vs-clearnet write-up + align the Tor-by-default story across docs
- tor-vs-clearnet.md: flip the banner from 'results pending' to complete; reconcile
 the Method/runbook params with what was actually run (20h blocks / 3h settle / 6
 blocks / ~5 days, not the planned 1.5-2d / 6h / 10-12d), and drop the non-existent
 --target-shares flag. Frame methodology + decision rule as pre-registration.
- faq.md + architecture.md: #165/#166 were still described as 'planned for v1.1' /
 'slated to move' and listed under 'what still touches clearnet' — now Tor-by-default
 as of v1.1 with the measured ~10% opt-out cost, consistent with privacy.md/configuration.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore(#256): satisfy develop's ruff+shfmt+markdown lint on the benchmark files
After merging develop (which added the Wave 7 lint tooling), CI applied ruff/shfmt/
markdownlint to the new benchmark files:
- bench-analyze.py: drop unused median import, lambda->def, and use calendar.timegm
 instead of datetime.UTC so the documented 'python3 bench-analyze.py' reproduce
 command stays portable to pre-3.11 Python; ruff-format applied
- bench-{collect,healthcheck,orchestrate,status}.sh + tests/stack/run.sh: shfmt -i 4
- tor-vs-clearnet.md: blank line around the Findings list (MD032)
No behavior change. lint-sh/lint-py/lint-md + the pithead suite all pass; the analyzer
reproduces the committed run's table byte-for-byte.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(#256): regenerate test-inventory for the config_bool regression test
develop's CI gained a 'make test-inventory-check' gate; the new tests/stack config_bool
case (#294 jq false-coercion guard) needs to be enumerated. No content change beyond
the regenerated inventory.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /