gpu-benchmarking

Here are 4 public repositories matching this topic...

Cre4T3Tiv3 / jetson-orin-matmul-analysis

Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.

machine-learning robotics cuda cublas scientific-computing matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson edge-ai embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Oct 14, 2025
Python

ZrobMiloudaa / jetson-orin-matmul-analysis

Star 1

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Feb 9, 2026
Python

Tennisee-data / benchHUB

Star 0

benchHUB is a Python-based project to parse, aggregate, and visualize system and performance benchmarks. It includes a Streamlit dashboard to display and compare results.

mac benchmarking data-science machine-learning hardware leaderboard gpu-computing leaderboards performance-testing gpu-benchmark fastapi streamlit benchmarking-utility apple-silicon cpu-benchmarks gpu-benchmarking

Updated Jan 27, 2026
Python

kadamrahul18 / GPT2-Optimization

Star 0

GPT-2 (124M) fixed-work distributed training benchmark on NYU BigPurple (Slurm) scaling ×ばつ V100 across 2 nodes using DeepSpeed ZeRO-1 + FP16/AMP. Built a reproducible harness that writes training_metrics.json + RUN_COMPLETE.txt + launcher metadata per run, plus NCCL topology/log artifacts and Nsight Systems traces/summaries (NVTX + NCCL ranges).

performance hpc amp slurm pytorch reproducibility distributed-training mixed-precision gpt2 deepspeed zero-1 gpu-benchmarking

Updated Feb 4, 2026
Python

Improve this page

Add a description, image, and links to the gpu-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly