JustAResearcher/Latency-Based-GPU-Algorithm

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
bench		bench
cuda		cuda
docs		docs
spec		spec
tests		tests
.gitignore		.gitignore
ALGORITHM_SPEC.md		ALGORITHM_SPEC.md
COMMUNITY_TESTING.md		COMMUNITY_TESTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE_NOTES_v0.1.5.md		RELEASE_NOTES_v0.1.5.md

Repository files navigation

GPUx — ASIC-resistant PoW for GPUs

Build & Release License: MIT Latest Release

Status: v0.1.5 community testing Target: Replacement for Cuckaroo29 (C29) in Tari (XTM) Goal: GPU-native, ASIC-resistant proof-of-work; low power; cheap verifier.

What this is

GPUx is a candidate proof-of-work algorithm designed to make GPU mining durable against ASIC takeover. It combines random per-epoch programs, a 2 GiB random-access DAG, and a per-thread scratchpad to force any would-be ASIC into looking like a GPU — at which point the ASIC has no cost advantage.

Three artifacts in this repo:

Algorithm spec (ALGORITHM_SPEC.md) — formal definition.
Reference C implementation (spec/) — the authoritative semantics.
CUDA implementation + bench harness (cuda/, bench/) — what community testers run on their GPUs.

If you are a community tester, jump to COMMUNITY_TESTING.md.

If you are reviewing the algorithm, start with ALGORITHM_SPEC.md and then docs/DESIGN_RATIONALE.md.

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Metric	Value
Hashrate	~1.25 MH/s
DAG generation	2 GiB in ~30 ms (~65 GB/s)
Per-share verify	~0.5 ms (warm DAG)
GPU vs reference	bit-identical (5/5 KAT nonces)

These are baseline numbers from a reference port. Optimized kernels (warp-cooperative DAG access, shared-memory scratchpad, instruction reordering) are expected to multiply throughput ×ばつ without changing consensus.

Why GPUx is hard for ASICs (one-screen summary)

ASICs win when the algorithm is small, homogeneous, and predictable. GPUx attacks each premise:

Property	GPUx mechanism
Predictable kernel	Random program regenerated every 1024 blocks
Small kernel	256 ops ×ばつ 64 iters = 16 384 ops/nonce, 12 distinct opcodes, 32 64-bit lanes
Cheap memory	2 GiB DAG with random dependent access (forces GDDR/HBM)
No cache	16 KiB per-thread scratchpad with R-M-W (forces L1-equivalent)
One datapath	Mix of 64-bit int ALU, MULHI, AES round, IEEE-754 FP32 FMA
Throughput parallel	Latency-bound dependent chains limit pipelining

Long-form analysis with comparisons to Ethash, ProgPoW, RandomX, Cuckaroo, and X16R is in docs/DESIGN_RATIONALE.md.

Repo layout

gpux/
├── ALGORITHM_SPEC.md formal algorithm spec
├── COMMUNITY_TESTING.md how to run tests and submit results
├── README.md this file
├── Makefile builds reference + tests (Linux/WSL/macOS)
├── spec/ reference C implementation
│ ├── gpux.h / gpux.c algorithm reference (the source of truth)
│ ├── blake2b.c+h embedded BLAKE2b reference
│ ├── chacha20.c+h embedded ChaCha20 reference
│ ├── aes_round.c+h embedded AES single-round reference
│ └── test_vectors.h frozen KAT (regenerate with `make gen-kat`)
├── tests/
│ ├── smoke.c primitive correctness (BLAKE2b, ChaCha20, AES, KAT generators)
│ ├── kat.c full hash KAT (allocates 2 GiB)
│ └── gen_kat.c regenerate test_vectors.h
├── cuda/ CUDA implementation
│ ├── gpux_kernel.cu the mining kernel
│ ├── gpux_device.cuh device-side BLAKE2b/ChaCha20/AES
│ ├── gpux_miner.cu host driver: verify, bench, info
│ ├── Makefile Linux/WSL build
│ └── build.bat Windows build (vcvars + nvcc)
├── bench/ community testing
│ ├── run_bench.ps1 Windows harness
│ ├── run_bench.sh Linux harness
│ └── results/ per-GPU JSON results (created on first run)
└── docs/
 └── DESIGN_RATIONALE.md why each design choice; ASIC-resistance argument

Building

Linux / WSL / macOS (reference + tests)

make smoke # primitive tests, no DAG
make kat # full KAT (allocates 2 GiB)

Linux / WSL / macOS (CUDA)

cd cuda && make
./gpux_miner verify
./gpux_miner bench 30

Windows (CUDA)

Requires Visual Studio 2022 BuildTools + CUDA 13.x.

cd cuda
.\build.bat
.\gpux_miner.exe verify
.\gpux_miner.exe bench 30

Or use the testing wrapper:

.\bench\run_bench.ps1 -Seconds 60

Tari integration (proposed)

Tari's existing block header is hashed with BLAKE2b-256 to produce a 32-byte digest. To use GPUx as a PoW algorithm:

header_digest = BLAKE2b-256(serialized_block_header_excluding_nonce)
block_hash = GPUx(header_digest, nonce)

Difficulty target and Tari's multi-algo selection layer integrate at the consensus boundary. See ALGORITHM_SPEC.md §11.

v0.1 status

Spec frozen for testing
Reference C impl, deterministic
KAT (1 epoch_seed, 5 nonces) with bit-exact reference output
CUDA impl matches reference
Baseline RTX 5090 hashrate (1.25 MH/s)
Cross-vendor FP32 determinism audit (NVIDIA Ada/Hopper/Blackwell vs AMD RDNA3/RDNA4 vs Intel)
Light-verifier Merkle DAG witness
Tari multi-algo selection integration
Optimized CUDA kernel (warp-coop DAG, shmem scratchpad)
OpenCL implementation for AMD/Intel

License

MIT — see LICENSE. Bundled reference primitives (BLAKE2b, ChaCha20, AES round, Argon2id) are public-domain or CC0/Apache-2.0 and remain so under MIT. The intent is full open-source auditability — fork it, break it, propose changes via PR, run your own bench results and submit them as JSON files in bench/results/.

About

ASIC-resistant, latency-bound proof-of-work algorithm for GPUs. Proposed replacement for Cuckaroo29 (C29) in Tari XTM.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JustAResearcher/Latency-Based-GPU-Algorithm

Folders and files

Latest commit

History

Repository files navigation

GPUx — ASIC-resistant PoW for GPUs

What this is

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Why GPUx is hard for ASICs (one-screen summary)

Repo layout

Building

Linux / WSL / macOS (reference + tests)

Linux / WSL / macOS (CUDA)

Windows (CUDA)

Tari integration (proposed)

v0.1 status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPUx — ASIC-resistant PoW for GPUs

What this is

Quick numbers (RTX 5090, unoptimized v0.1 kernel)

Why GPUx is hard for ASICs (one-screen summary)

Repo layout

Building

Linux / WSL / macOS (reference + tests)

Linux / WSL / macOS (CUDA)

Windows (CUDA)

Tari integration (proposed)

v0.1 status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages