Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Gama1903/cuda_programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

9 Commits

Repository files navigation

CUDA Programming

Introduction

Practice of cuda programming.

Reference

  1. Book: Programming Massively Parallel Processors 4th(PMPP)
  2. Lecture: CUDA MODE
  3. https://github.com/heyuhhh/Programming-Massively-Parallel-Processors-4th

Requirement

  1. python==3.8
  2. torch==2.0.0+cu118
  3. torchaudio==2.0.1+cu118
  4. torchvision==0.15.1+cu118
  5. ninja==1.11.1.3
  6. setuptools==60.2.0
  7. ipykernel==6.29.5

Index

  1. L001_How_to_profile_cuda_kernels_in_pytorch
  2. Ch02_Heterogeneous_data_parallel_computing
  3. L002_Ch1-3_PMPP_book
  4. Ch03_Multidimensional_grids_and_data
  5. L003_Get_started_with_cuda_for_python_programmer
  6. L004_Compute_and_memory_basics
  7. L005_Going_futher_with_cuda_for_python_programmer
  8. Ch04_Compute_architecture_and_scheduling
  9. Ch05_Memory_architecture_and_data_locality
  10. L006_Optimizing_opitimizers
  11. L007_Advanced_quantization
  12. L008_Cuda_performance_checklist
  13. Ch06_Performance_considerations
  14. Ch07_Convolution
  15. Ch08_Stencil
  16. L009_Reductions
  17. Ch10_Reduction
  18. L011_Sparsity
  19. Ch14_Sparse_matrix_computation
  20. L012_Flash_attenion
  21. L013_Ring_attention
  22. L014_Practitioners_guide_to_triton
  23. L015_CUTLASS
  24. L016_On_hands_profiling
  25. L018_Fusing_kernels
  26. L020_Scan_algorithm
  27. L021_Scan_algorithm_part2
  28. Ch11_Scan

About

Practice of cuda programming

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /