Shared CUDA harness for the SuperInstance fleet GPU agents. Internal crate, not published to crates.io.
| Module | Types | Purpose |
|---|---|---|
context |
GpuContext |
CUDA device init, properties, availability |
buffer |
GpuBuffer<T> |
Typed device memory alloc/upload/download |
kernel |
PtxKernel, KernelArg |
PTX loading + grid/block launch |
pool |
BufferPool, PoolStats |
Reuse device allocations across kernel launches |
bench |
KernelBenchmark, BenchResult |
Warmup/timed kernel benchmarking with percentiles |
fallback |
helpers | CPU fallback when GPU is unavailable |
use superinstance_gpu_compute::*; let ctx = GpuContext::new()?; if ctx.available { let mut buf = GpuBuffer::<f32>::alloc(&ctx, 1024)?; buf.upload(&vec![1.0; 1024])?; let result = buf.download()?; println!("Downloaded {} elements", result.len()); } else { println!("No GPU available, using CPU fallback"); }
- NVIDIA GPU with compute capability 5.0+ (sm_50 through sm_90)
- CUDA Toolkit 11.x or 12.x with
nvccin$PATH - Rust 1.75+
- Linux (WSL2 works; macOS and Windows not tested)
# Check your GPU nvidia-smi # Install CUDA 12.x (recommended) wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y cuda-toolkit-12-6 # Add to ~/.bashrc export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
nvcc --version nvidia-smi
export CUDA_HOME=/usr/local/cuda# Build without GPU support (all types exist, GPU ops return NotAvailable) cargo build # Build with GPU support cargo build --features gpu # Run tests without GPU cargo test # Run tests with GPU (requires CUDA) cargo test --features gpu # Run benchmarks with GPU cargo bench --features gpu
| Feature | Description |
|---|---|
gpu |
Enable real CUDA operations via cudarc. Disabled by default. |
When the gpu feature is off, all types still compile but GPU operations return Err(GpuError::NotAvailable). This allows consumer crates to compile on machines without CUDA.
Add to your Cargo.toml:
[dependencies] superinstance-gpu-compute = { git = "https://github.com/SuperInstance/superinstance-gpu-compute", optional = true } # Or with GPU: # superinstance-gpu-compute = { git = "https://github.com/SuperInstance/superinstance-gpu-compute", features = ["gpu"] }
GpuContext (CUDA device handle)
├── GpuBuffer<T> (typed device memory)
├── PtxKernel (loaded PTX module + function)
├── BufferPool (cache of GpuBuffers by type/size)
└── KernelBenchmark (measure kernel execution time)
This crate wraps cudarc the same way tower wraps hyper — higher-level patterns for fleet use.
MIT