Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: JustAResearcher/Latency-Based-GPU-Algorithm

v0.1.5

10 May 02:11
@github-actions github-actions

Choose a tag to compare

GPUx v0.1.5 — Community Testing Release

First release for community testing. ASIC-resistant, latency-bound proof-of-work for GPUs; proposed replacement for Cuckaroo29 (C29) in Tari (XTM).

What's in this release

  • Frozen algorithm spec (ALGORITHM_SPEC.md)
  • C reference implementation (spec/) — bit-exact authority
  • CUDA mining kernel + GPU Argon2id DAG generator (cuda/)
  • Cross-platform benchmark harness (bench/)
  • Pre-built Windows binary (multi-arch fat binary, sm_75 → sm_120)

Pre-built binaries

Platform Binary Coverage
Windows x64, CUDA 13.2+ gpux_miner-v0.1.5-windows-x64-cuda13-multiarch.exe Turing (RTX 20-series, GTX 16-series, T4) → Blackwell (RTX 50-series), all CMP cards

The Windows binary is a CUDA fat binary containing compiled code for sm_75, sm_80, sm_86, sm_89, sm_90, and sm_120, plus PTX for forward compatibility. One file runs on any supported GPU.

Linux: build from source with cd cuda && make after installing CUDA Toolkit 12.6+ or 13+. A Linux x64 binary will ship in v0.1.6 once CI is set up.

Verified baseline

  • NVIDIA RTX 5090 (sm_120) @ stock: 1.46 MH/s at 410 W (~3.6 kH/W)
  • DAG generation: 32.5 s for 2 GiB (one-time per epoch)

Known limitations

  • v0.1.5 is CUDA-only. AMD / Intel support via OpenCL planned for v0.2.
  • No light-verifier (Merkle DAG witness) yet — full nodes need the 2 GiB DAG. Out of scope for this round.
  • Tari multi-algo integration not wired — that comes after community testing settles the algorithm.

Tester quick-start

# Windows
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
.\bench\run_bench.ps1
# Linux
git clone https://github.com/JustAResearcher/Latency-Based-GPU-Algorithm.git
cd Latency-Based-GPU-Algorithm
./bench/run_bench.sh

The harness:

  1. Builds gpux_miner (or uses the pre-built .exe if present)
  2. Runs verify — confirms your GPU produces bit-exact hashes vs the C reference
  3. Runs bench 60 — 60 seconds of steady-state hashing
  4. Writes a JSON to bench/results/

Submit your JSON via PR or issue. We're collecting:

  • Hashrate vs SM count vs VRAM vs power
  • Cross-arch determinism evidence
  • Any verify failures (bit-exact mismatches between GPU and reference)

See COMMUNITY_TESTING.md for the full protocol.

What changed since v0.1

  • DAG generation moved from ChaCha20 to Argon2id (RFC 9106). Prevents the "compute-don't-store" ASIC attack — recompute is now ×ばつ more expensive than reading from DRAM, forcing any competitive ASIC to ship with HBM/GDDR.
  • GPU Argon2id port: epoch transition went from ~9.5 minutes (CPU) to 32.5 seconds (GPU).
  • Verified primitives: BLAKE2b against RFC 7693, ChaCha20 against RFC 8439, AES round against FIPS-197 Appendix B, Argon2id against RFC 9106 / argon2 CLI vectors. All KAT pass.

License

MIT. Bundled crypto primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0+Apache-2.0. Fork it, audit it, break it.

Assets 5
Loading

AltStyle によって変換されたページ (->オリジナル) /