Practice of cuda programming.
- Book: Programming Massively Parallel Processors 4th(PMPP)
- Lecture: CUDA MODE
- https://github.com/heyuhhh/Programming-Massively-Parallel-Processors-4th
- python==3.8
- torch==2.0.0+cu118
- torchaudio==2.0.1+cu118
- torchvision==0.15.1+cu118
- ninja==1.11.1.3
- setuptools==60.2.0
- ipykernel==6.29.5
- L001_How_to_profile_cuda_kernels_in_pytorch
- Ch02_Heterogeneous_data_parallel_computing
- L002_Ch1-3_PMPP_book
- Ch03_Multidimensional_grids_and_data
- L003_Get_started_with_cuda_for_python_programmer
- L004_Compute_and_memory_basics
- L005_Going_futher_with_cuda_for_python_programmer
- Ch04_Compute_architecture_and_scheduling
- Ch05_Memory_architecture_and_data_locality
- L006_Optimizing_opitimizers
- L007_Advanced_quantization
- L008_Cuda_performance_checklist
- Ch06_Performance_considerations
- Ch07_Convolution
- Ch08_Stencil
- L009_Reductions
- Ch10_Reduction
- L011_Sparsity
- Ch14_Sparse_matrix_computation
- L012_Flash_attenion
- L013_Ring_attention
- L014_Practitioners_guide_to_triton
- L015_CUTLASS
- L016_On_hands_profiling
- L018_Fusing_kernels
- L020_Scan_algorithm
- L021_Scan_algorithm_part2
- Ch11_Scan