Commit c739c83

committed

Chapter 4: Intermediate Algorithms

1 parent 2ee8c5e commit c739c83Copy full SHA for c739c83

File tree

4 files changed

+116

-1

lines changed

Chapter4-IntermediateAlgo
README.md

4 files changed

+116

-1

lines changed

`‎Chapter4-IntermediateAlgo/Readme.md`

Lines changed: 50 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,50 @@`
	`1`	`+# Chapter 4: Intermediate Algorithms - CUDA Merge Sort`
	`2`	`+`
	`3`	`+## Overview`
	`4`	`+`
	`5`	`+This chapter explores parallelizable algorithms, focusing on sorting techniques. We have chosen to implement merge sort using CUDA as an example of an intermediate-level algorithm that can benefit from parallel processing.`
	`6`	`+`
	`7`	`+## Merge Sort in CUDA`
	`8`	`+`
	`9`	`+Merge sort is an efficient, stable sorting algorithm that follows the divide-and-conquer paradigm. It's well-suited for parallel implementation due to its recursive nature and the independence of its sub-problems.`
	`10`	`+`
	`11`	`+### Key Components`
	`12`	`+`
	`13`	+1. `merge` function (device):
	`14`	`+ - Merges two sorted subarrays into a single sorted array.`
	`15`	`+ - Uses temporary storage for merging to avoid in-place operations.`
	`16`	`+`
	`17`	+2. `mergeSortRecursive` function (device):
	`18`	`+ - Implements the recursive part of the merge sort algorithm.`
	`19`	`+ - Divides the array, recursively sorts subarrays, and merges results.`
	`20`	`+`
	`21`	+3. `mergeSort` kernel (global):
	`22`	`+ - Entry point for the GPU execution.`
	`23`	`+ - Calls the recursive merge sort function.`
	`24`	`+`
	`25`	+4. `main` function:
	`26`	`+ - Sets up the problem on the host.`
	`27`	`+ - Allocates memory on the device.`
	`28`	`+ - Launches the kernel and retrieves results.`
	`29`	`+`
	`30`	`+## Implementation Details`
	`31`	`+`
	`32`	+- The code uses CUDA-specific keywords like `__device__` and `__global__` to define functions that run on the GPU.
	`33`	`+- Memory is allocated on both host and device to facilitate data transfer.`
	`34`	`+- The sorting is performed entirely on the GPU, with only the final sorted array transferred back to the host.`
	`35`	`+`
	`36`	`+## Optimization Techniques`
	`37`	`+`
	`38`	`+While this implementation demonstrates the basic structure of a CUDA merge sort, several optimization techniques could be applied:`
	`39`	`+`
	`40`	`+1. Shared Memory: Utilize shared memory for faster access to frequently used data within a thread block.`
	`41`	`+2. Coalesced Memory Access: Optimize global memory access patterns for better performance.`
	`42`	`+3. Dynamic Parallelism: Use CUDA's dynamic parallelism to launch child kernels from within the device code, potentially improving the parallelization of the recursive steps.`
	`43`	`+4. Hybrid Approach: Combine GPU parallelism for large-scale divisions with CPU processing for smaller subarrays.`
	`44`	`+`
	`45`	`+## Execution`
	`46`	`+![Merge](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/blob/main/Chapter4-IntermediateAlgo/merge.png)`
	`47`	`+`
	`48`	`+## Conclusion`
	`49`	`+`
	`50`	`+This CUDA implementation of merge sort demonstrates how a classic sorting algorithm can be adapted for parallel execution on a GPU. It serves as a foundation for understanding more complex parallel algorithms and optimization techniques in CUDA programming.`

`‎Chapter4-IntermediateAlgo/merge.png`

22.7 KB

Loading[フレーム]

`‎Chapter4-IntermediateAlgo/merge_sort.cu`

Lines changed: 65 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,65 @@`
	`1`	`+#include <iostream>`
	`2`	`+#include <cstdlib>`
	`3`	`+`
	`4`	`+// Merge two sorted arrays`
	`5`	`+__device__ void merge(int* arr, int* temp, int left, int mid, int right) {`
	`6`	`+ int i = left;`
	`7`	`+ int j = mid + 1;`
	`8`	`+ int k = left;`
	`9`	`+`
	`10`	`+ while (i <= mid && j <= right) {`
	`11`	`+ if (arr[i] <= arr[j])`
	`12`	`+ temp[k++] = arr[i++];`
	`13`	`+ else`
	`14`	`+ temp[k++] = arr[j++];`
	`15`	`+ }`
	`16`	`+`
	`17`	`+ while (i <= mid)`
	`18`	`+ temp[k++] = arr[i++];`
	`19`	`+ while (j <= right)`
	`20`	`+ temp[k++] = arr[j++];`
	`21`	`+`
	`22`	`+ for (int idx = left; idx <= right; ++idx)`
	`23`	`+ arr[idx] = temp[idx];`
	`24`	`+}`
	`25`	`+`
	`26`	`+`
	`27`	`+// Recursive merge sort`
	`28`	`+__device__ void mergeSortRecursive(int* arr, int* temp, int left, int right) {`
	`29`	`+ if (left < right) {`
	`30`	`+ int mid = left + (right - left) / 2;`
	`31`	`+ mergeSortRecursive(arr, temp, left, mid);`
	`32`	`+ mergeSortRecursive(arr, temp, mid + 1, right);`
	`33`	`+ merge(arr, temp, left, mid, right);`
	`34`	`+ }`
	`35`	`+}`
	`36`	`+`
	`37`	`+// Kernel function to start the merge sort`
	`38`	`+__global__ void mergeSort(int* arr, int* temp, int N) {`
	`39`	`+ mergeSortRecursive(arr, temp, 0, N - 1);`
	`40`	`+}`
	`41`	`+`
	`42`	`+int main() {`
	`43`	`+ const int N = 10; // Number of elements`
	`44`	`+ int arr[N] = {9, 3, 7, 1, 5, 8, 2, 4, 6, 0};`
	`45`	`+ int* d_arr;`
	`46`	`+ int* d_temp;`
	`47`	`+`
	`48`	`+ cudaMalloc(&d_arr, N * sizeof(int));`
	`49`	`+ cudaMalloc(&d_temp, N * sizeof(int));`
	`50`	`+ cudaMemcpy(d_arr, arr, N * sizeof(int), cudaMemcpyHostToDevice);`
	`51`	`+`
	`52`	`+ mergeSort<<<1, 1>>>(d_arr, d_temp, N);`
	`53`	`+`
	`54`	`+ cudaMemcpy(arr, d_arr, N * sizeof(int), cudaMemcpyDeviceToHost);`
	`55`	`+`
	`56`	`+ std::cout << "Sorted array: ";`
	`57`	`+ for (int i = 0; i < N; ++i)`
	`58`	`+ std::cout << arr[i] << " ";`
	`59`	`+ std::cout << std::endl;`
	`60`	`+`
	`61`	`+ cudaFree(d_arr);`
	`62`	`+ cudaFree(d_temp);`
	`63`	`+`
	`64`	`+ return 0;`
	`65`	`+}`

`‎README.md`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -69,4 +69,4 @@`
`69`	`69`	`- [x] [Chapter 1: Basics of CUDA Programming](Chapter1-Basics)`
`70`	`70`	`- [x] [Chapter 2: Basic Syntax](Chapter2-BasicSyntax)`
`71`	`71`	`- [x] [Chapter 3: Simple CUDA Vector Addition](Chapter3-EasyCudaProject)`
`72`		`-`
	`72`	`+-[x][Chapter 4: Intermediate Algorithms](Chapter4-IntermediateAlgo)`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c739c83

File tree

4 files changed

4 files changed

`‎Chapter4-IntermediateAlgo/Readme.md`

`‎Chapter4-IntermediateAlgo/merge.png`

`‎Chapter4-IntermediateAlgo/merge_sort.cu`

`‎README.md`

0 commit comments