Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

🚀 High-performance implementations and benchmarks of SSSP and APSP algorithms (Bellman–Ford, Dijkstra, Floyd–Warshall, Johnson) in Serial, OpenMP, CUDA, and Hybrid CPU+GPU. Includes profiling, speedup plots, and HPC notebooks

Notifications You must be signed in to change notification settings

UchihaIthachi/sssp-apsp-hpc-openmp-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

45 Commits

Repository files navigation

License Top Language PRs Welcome

🚀 High-Performance SSSP & APSP Algorithms with OpenMP + CUDA

A research-grade performance study of classic shortest-path algorithms (SSSP & APSP) using Serial, OpenMP (CPU), and CUDA (GPU). This repo includes well-optimized C++ kernels, reproducible Jupyter benchmarks, and HPC profiling tools—all tailored for both teaching and advanced performance engineering.


🔍 Algorithms Implemented

Algorithm Type Description
Bellman-Ford SSSP Supports negative weights, detects cycles. Frontier variant discussed.
Dijkstra SSSP Classic non-negative paths. Includes OpenMP variant and Δ-stepping analysis.
Floyd-Warshall APSP Dynamic programming; efficient on dense graphs; includes cache-aware version.
Johnson's APSP Combines BF & Dijkstra. Efficient for sparse graphs with negative weights.

⚙️ Parallel Variants

Each algorithm is implemented in the following forms to explore parallelism trade-offs:

  • serial: Clean single-threaded baseline.
  • openmp: Loop-parallelized CPU kernels with scheduling and locking optimizations.
  • cuda: Optimized GPU kernels using thread-level parallelism and memory coalescing.
  • hybrid: (optional) Proof-of-concept split execution between CPU and GPU.

🧠 What You'll Learn

This repo is not just code—it's an HPC lab in a box. Each notebook dives deep into a key HPC concept:

Topic Techniques & Insights
Work vs Parallelism When to prefer Bellman-Ford over Dijkstra? What trade-offs do Δ-stepping and frontier methods offer?
Memory Optimization Blocked Floyd-Warshall for cache locality; pinned host memory & CUDA streams for async transfers.
Thread Scheduling OpenMP static, dynamic, and guided schedules benchmarked and visualized.
Profiling & Analysis Nsight Systems CLI traces of GPU kernels, memory copies, and overlap timing.
GPU vs CPU Tradeoffs Real-world scenarios where GPU underperforms (e.g., small graph, low compute intensity).
cuGraph Comparison Benchmarks against RAPIDS cuGraph to set baselines for CUDA kernel quality.
Real vs Synthetic Data Benchmarks with synthetic ER graphs and real-world road/social/web networks from SNAP & DIMACS.

📁 Project Layout

notebooks/
00_setup_build.ipynb # 📦 Setup: clone, build, and detect environment
01_bellman_ford_sssp.ipynb # 🔁 BF variants benchmarked on SSSP with negative weights
02_dijkstra_sssp.ipynb # ⏩ Dijkstra variants on non-negative graphs, incl. cuGraph
03_floyd_warshall_apsp.ipynb # 🧮 Dense graph APSP + Blocked FW
04_johnson_apsp.ipynb # 🔀 Sparse graph APSP + parallel Dijkstra passes
05_compare_rollup.ipynb # 📊 Rollup notebook for plotting speedup, crossover points
src/
*.cpp, *.cu, *.h # 🧩 All algorithm implementations
bin/
executables after `make` # 🧵 CPU/GPU targets (e.g., `BF_openmp`, `dijkstra_cuda`)
Makefile # 🔧 Customizable build logic (auto-detects CUDA)

⚡ Quickstart on Google Colab

Notebook Link
Setup (build, env detect) Open In Colab
Bellman–Ford SSSP Open In Colab
Dijkstra SSSP Open In Colab
Floyd–Warshall APSP Open In Colab
Johnson's APSP Open In Colab
Compare & Rollup Open In Colab

🛠 Manual Usage

🔨 Build

Requires:

  • GCC with OpenMP support
  • CUDA Toolkit (for CUDA/hybrid variants)
# Clone the repository
git clone https://github.com/UchihaIthachi/sssp-apsp-hpc-openmp-cuda.git
cd sssp-apsp-hpc-openmp-cuda
# Build all available executables
make all
# Clean build artifacts
make clean

Environment variables:

# Override GPU architecture if needed
export GPU_ARCH=sm_75
# Skip CUDA build
export DISABLE_CUDA=1

▶️ Run

Each executable takes the following CLI format:

./bin/BF_openmp <V> <min_w> <max_w> [density=0.01] [threads=4]
./bin/dijkstra_cuda <V> <min_w> <max_w> [density=0.01]

Outputs:

  • Saves .csv or .txt of shortest-path results per variant.
  • Prints time in s to stdout.

📈 Example Results

Vertices Dijkstra CUDA BF OpenMP FW OpenMP Johnson CUDA
500 ×ばつ ×ばつ ×ばつ ×ばつ
1000 ×ばつ ×ばつ ×ばつ ×ばつ
2000 ×ばつ ×ばつ ×ばつ ×ばつ
5000 ×ばつ ×ばつ ×ばつ ×ばつ

(CUDA outperforms CPU significantly on large graphs; OpenMP gives modest gains depending on algorithm and system.)


💡 Credits & References


🙌 Contributions Welcome

Have an idea to extend the kernels, add distributed memory (MPI), or optimize memory coalescing? Feel free to open an issue or pull request.

If you're using this for coursework or research, feel free to cite or fork with attribution.


📄 License

This project is licensed under the MIT License. See LICENSE for details.

About

🚀 High-performance implementations and benchmarks of SSSP and APSP algorithms (Bellman–Ford, Dijkstra, Floyd–Warshall, Johnson) in Serial, OpenMP, CUDA, and Hybrid CPU+GPU. Includes profiling, speedup plots, and HPC notebooks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /