Version: v0.6.56 (Archived Snapshot, 2019)
High-performance quasi-random number generator utilizing NVIDIA CUDA and cuRAND.
This is an archived legacy project from 2019. It was developed to accelerate quasi-random number generation for Monte Carlo simulations during my time at the Skobeltsyn Institute of Nuclear Physics, Moscow State University (SINP MSU).
The code includes hardware profiling specifically tailored for the NVIDIA Tesla P100 (PCIe 12GB, GP100 architecture, 3584 CUDA cores) cluster we were running at the time.
Fun Fact: The logic was heavily tested on a consumer GTX 1080 Ti at home. Since the fully unlocked GP100 (P100) and GP102 (1080 Ti) share the same 3584 FP32 CUDA cores, the consumer card served as a perfect, accessible sandbox for debugging before deploying to the FP64-heavy scientific cluster.
Another Fun Fact: The copyright header in kernel.cu still reads v0.4.2. A classic reminder that updating version strings in every single file was the last priority before the 2019 deadline. The actual compiled release is v0.6.56.
Note: As a legacy v0.6.56 release, it contains known performance anti-patterns by modern standards (such as initializing curandState inside the kernel). It is provided "as-is" for historical and archival purposes.
- Generates pseudo-random doubles using
cuRANDfor scientific computations. - CLI interface with short and long options.
- Built-in GPU architecture info tool (with hardcoded core counts for Fermi, Kepler, Maxwell, Pascal, and Volta).
The generated sequences were empirically validated for Monte Carlo suitability. Test results confirmed:
-
Mean: β 0.50 (Theoretical for
$U(0,1)$ : 0.5) -
Standard Deviation: β 0.288 (Theoretical for
$U(0,1)$ :$\sqrt{1/12} \approx 0.288675$ ) - Pearson Correlation between parallel streams: β 0.0 (Confirming statistical independence of concurrent GPU threads, which is critical for parallel Monte Carlo).
ΠΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°Π½ΠΎ Π΄Π»Ρ ΡΡΠΊΠΎΡΠ΅Π½ΠΈΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΠΊΠ²Π°Π·ΠΈ-ΡΠ°Π½Π΄ΠΎΠΌΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ ΡΠΈΡΠ΅Π». ΠΠ»Ρ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΡΠΈΡΠ΅Π» ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅Ρ ΡΠ½ΠΈΡΠΈΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΡΠ΅ΠΉΠ΄Π΅ΡΠ½ΡΠ΅ Π±Π»ΠΎΠΊΠΈ. ΠΠ° Π³Π΅Π½Π΅ΡΠ°ΡΠΈΡ ΠΎΡΠ²Π΅ΡΠ°Π΅Ρ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠ° cuRAND.
ΠΠ»Ρ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΡ ΡΠΈΡΠ»Π° Π½Π΅ΠΎΠ±Ρ ΠΎΠ΄ΠΈΠΌΠΎ Π·Π°Π΄Π°ΡΡ ΠΊΠ°ΠΊ ΠΌΠΈΠ½ΠΈΠΌΡΠΌ Π΄Π²Π° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠ°: ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ SM-Π±Π»ΠΎΠΊΠΎΠ² Π΄Π»Ρ Π΄Π°Π½Π½ΠΎΠΉ ΠΎΠΏΠ΅ΡΠ°ΡΠΈΠΈ, Π° ΡΠ°ΠΊΠΆΠ΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΏΠΎΡΠΎΠΊΠΎΠ². Π‘ΡΠΎΠΈΡ ΠΎΠ±ΡΠ°ΡΠΈΡΡ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ Π½Π° ΡΠΎ, ΡΡΠΎ ΠΊΠ°ΠΆΠ΄ΡΠΉ GPU ΠΈΠΌΠ΅Π΅Ρ Π°ΠΏΠΏΠ°ΡΠ°ΡΠ½ΠΎΠ΅ ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½ΠΈΠ΅ Π΄Π»Ρ ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΈΠ· Π΄Π²ΡΡ Π²ΡΡΠ΅ΠΏΠ΅ΡΠ΅ΡΠΈΡΠ»Π΅Π½Π½ΡΡ Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ², Π² ΡΠ²ΡΠ·ΠΈ Ρ ΡΠ΅ΠΌ Π½Π΅ΠΎΠ±Ρ ΠΎΠ΄ΠΈΠΌΠΎ ΠΎΠ·Π½Π°ΠΊΠΎΠΌΠΈΡΡΡΡ Ρ ΡΠ΅Ρ Π½ΠΈΡΠ΅ΡΠΊΠΎΠΉ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠ°ΡΠΈΠ΅ΠΉ ΠΏΡΠ΅Π΄ΠΎΡΡΠ°Π²Π»Π΅Π½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΠΎΡΡΠ΄ΠΎΠ²Π°Π½ΠΈΡ ΠΏΠ΅ΡΠ΅Π΄ Π½Π°ΡΠ°Π»ΠΎΠΌ ΡΠ°Π±ΠΎΡΡ.
| ΠΡΠ³ΡΠΌΠ΅Π½Ρ | ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ |
|---|---|
-b [ΡΠΈΡΠ»ΠΎ] / --blocks [ΡΠΈΡΠ»ΠΎ] |
ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ SM-Π±Π»ΠΎΠΊΠΎΠ² |
-t [ΡΠΈΡΠ»ΠΎ] / --threads [ΡΠΈΡΠ»ΠΎ] |
ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΏΠΎΡΠΎΠΊΠΎΠ² |
-h / --help |
Π‘ΠΏΡΠ°Π²ΠΊΠ° ΠΏΠΎ ΡΠ°Π±ΠΎΡΠ΅ Ρ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ΠΌ |
-v / --version |
ΠΠ΅ΡΡΠΈΡ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ |
-i / --info |
ΠΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΡ ΡΡΡΡΠΎΠΉΡΡΠ²Π° |
# Extract the archive tar -zxvf neorand_v0.6.56.tar.gz # Compile (requires NVIDIA CUDA Toolkit) nvcc kernel.cu -o neorand # Check GPU info ./neorand -i # Run generation (e.g., 120 blocks, 256 threads per block) ./neorand -b 120 -t 256
This project is licensed under the MIT License - see the copyright header in kernel.cu for details.