Name	Name	Last commit message	Last commit date
Latest commit History 45 Commits
.vscode	.vscode
data	data
include	include
notebooks	notebooks
scripts	scripts
src	src
utils	utils
.gitignore	.gitignore
Makefile	Makefile
README.md	README.md

🚀 High-Performance SSSP & APSP Algorithms with OpenMP + CUDA

A research-grade performance study of classic shortest-path algorithms (SSSP & APSP) using Serial, OpenMP (CPU), and CUDA (GPU). This repo includes well-optimized C++ kernels, reproducible Jupyter benchmarks, and HPC profiling tools—all tailored for both teaching and advanced performance engineering.

🔍 Algorithms Implemented

Algorithm	Type	Description
Bellman-Ford	SSSP	Supports negative weights, detects cycles. Frontier variant discussed.
Dijkstra	SSSP	Classic non-negative paths. Includes OpenMP variant and Δ-stepping analysis.
Floyd-Warshall	APSP	Dynamic programming; efficient on dense graphs; includes cache-aware version.
Johnson's	APSP	Combines BF & Dijkstra. Efficient for sparse graphs with negative weights.

⚙️ Parallel Variants

Each algorithm is implemented in the following forms to explore parallelism trade-offs:

serial: Clean single-threaded baseline.
openmp: Loop-parallelized CPU kernels with scheduling and locking optimizations.
cuda: Optimized GPU kernels using thread-level parallelism and memory coalescing.
hybrid: (optional) Proof-of-concept split execution between CPU and GPU.

🧠 What You'll Learn

This repo is not just code—it's an HPC lab in a box. Each notebook dives deep into a key HPC concept:

Topic	Techniques & Insights
Work vs Parallelism	When to prefer Bellman-Ford over Dijkstra? What trade-offs do Δ-stepping and frontier methods offer?
Memory Optimization	Blocked Floyd-Warshall for cache locality; pinned host memory & CUDA streams for async transfers.
Thread Scheduling	OpenMP `static`, `dynamic`, and `guided` schedules benchmarked and visualized.
Profiling & Analysis	Nsight Systems CLI traces of GPU kernels, memory copies, and overlap timing.
GPU vs CPU Tradeoffs	Real-world scenarios where GPU underperforms (e.g., small graph, low compute intensity).
cuGraph Comparison	Benchmarks against RAPIDS cuGraph to set baselines for CUDA kernel quality.
Real vs Synthetic Data	Benchmarks with synthetic ER graphs and real-world road/social/web networks from SNAP & DIMACS.

📁 Project Layout

notebooks/
00_setup_build.ipynb # 📦 Setup: clone, build, and detect environment
01_bellman_ford_sssp.ipynb # 🔁 BF variants benchmarked on SSSP with negative weights
02_dijkstra_sssp.ipynb # ⏩ Dijkstra variants on non-negative graphs, incl. cuGraph
03_floyd_warshall_apsp.ipynb # 🧮 Dense graph APSP + Blocked FW
04_johnson_apsp.ipynb # 🔀 Sparse graph APSP + parallel Dijkstra passes
05_compare_rollup.ipynb # 📊 Rollup notebook for plotting speedup, crossover points
src/
*.cpp, *.cu, *.h # 🧩 All algorithm implementations
bin/
executables after `make` # 🧵 CPU/GPU targets (e.g., `BF_openmp`, `dijkstra_cuda`)
Makefile # 🔧 Customizable build logic (auto-detects CUDA)

⚡ Quickstart on Google Colab

Notebook	Link
Setup (build, env detect)	Open In Colab
Bellman–Ford SSSP	Open In Colab
Dijkstra SSSP	Open In Colab
Floyd–Warshall APSP	Open In Colab
Johnson's APSP	Open In Colab
Compare & Rollup	Open In Colab

🛠 Manual Usage

🔨 Build

Requires:

GCC with OpenMP support
CUDA Toolkit (for CUDA/hybrid variants)

# Clone the repository
git clone https://github.com/UchihaIthachi/sssp-apsp-hpc-openmp-cuda.git
cd sssp-apsp-hpc-openmp-cuda
# Build all available executables
make all
# Clean build artifacts
make clean

Environment variables:

# Override GPU architecture if needed
export GPU_ARCH=sm_75
# Skip CUDA build
export DISABLE_CUDA=1

▶️ Run

Each executable takes the following CLI format:

./bin/BF_openmp <V> <min_w> <max_w> [density=0.01] [threads=4]
./bin/dijkstra_cuda <V> <min_w> <max_w> [density=0.01]

Outputs:

Saves .csv or .txt of shortest-path results per variant.
Prints time in s to stdout.

📈 Example Results

Vertices	Dijkstra CUDA	BF OpenMP	FW OpenMP	Johnson CUDA
500	×ばつ	×ばつ	×ばつ	×ばつ
1000	×ばつ	×ばつ	×ばつ	×ばつ
2000	×ばつ	×ばつ	×ばつ	×ばつ
5000	×ばつ	×ばつ	×ばつ	×ばつ

(CUDA outperforms CPU significantly on large graphs; OpenMP gives modest gains depending on algorithm and system.)

💡 Credits & References

CUDA streams and pinned memory: mups.etf.rs
cuGraph benchmarking: RAPIDS.ai
Floyd–Warshall blocked cache-aware layout: arXiv 2009.11873
Johnson algorithm review: Moore & Kalapos, 2010
Δ-Stepping SSSP: Meyer & Sanders, JEA 2003

🙌 Contributions Welcome

Have an idea to extend the kernels, add distributed memory (MPI), or optimize memory coalescing? Feel free to open an issue or pull request.

If you're using this for coursework or research, feel free to cite or fork with attribution.

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UchihaIthachi/sssp-apsp-hpc-openmp-cuda

Folders and files

Latest commit

History

Repository files navigation

🚀 High-Performance SSSP & APSP Algorithms with OpenMP + CUDA

🔍 Algorithms Implemented

⚙️ Parallel Variants

🧠 What You'll Learn

📁 Project Layout

⚡ Quickstart on Google Colab

🛠 Manual Usage

🔨 Build

▶️ Run

📈 Example Results

💡 Credits & References

🙌 Contributions Welcome

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

UchihaIthachi/sssp-apsp-hpc-openmp-cuda

Folders and files

Latest commit

History

Repository files navigation

🚀 High-Performance SSSP & APSP Algorithms with OpenMP + CUDA

🔍 Algorithms Implemented

⚙️ Parallel Variants

🧠 What You'll Learn

📁 Project Layout

⚡ Quickstart on Google Colab

🛠 Manual Usage

🔨 Build

▶️ Run

📈 Example Results

💡 Credits & References

🙌 Contributions Welcome

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages