Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit c739c83

Browse files
Chapter 4: Intermediate Algorithms
1 parent 2ee8c5e commit c739c83

File tree

4 files changed

+116
-1
lines changed

4 files changed

+116
-1
lines changed

‎Chapter4-IntermediateAlgo/Readme.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Chapter 4: Intermediate Algorithms - CUDA Merge Sort
2+
3+
## Overview
4+
5+
This chapter explores parallelizable algorithms, focusing on sorting techniques. We have chosen to implement merge sort using CUDA as an example of an intermediate-level algorithm that can benefit from parallel processing.
6+
7+
## Merge Sort in CUDA
8+
9+
Merge sort is an efficient, stable sorting algorithm that follows the divide-and-conquer paradigm. It's well-suited for parallel implementation due to its recursive nature and the independence of its sub-problems.
10+
11+
### Key Components
12+
13+
1. `merge` function (device):
14+
- Merges two sorted subarrays into a single sorted array.
15+
- Uses temporary storage for merging to avoid in-place operations.
16+
17+
2. `mergeSortRecursive` function (device):
18+
- Implements the recursive part of the merge sort algorithm.
19+
- Divides the array, recursively sorts subarrays, and merges results.
20+
21+
3. `mergeSort` kernel (global):
22+
- Entry point for the GPU execution.
23+
- Calls the recursive merge sort function.
24+
25+
4. `main` function:
26+
- Sets up the problem on the host.
27+
- Allocates memory on the device.
28+
- Launches the kernel and retrieves results.
29+
30+
## Implementation Details
31+
32+
- The code uses CUDA-specific keywords like `__device__` and `__global__` to define functions that run on the GPU.
33+
- Memory is allocated on both host and device to facilitate data transfer.
34+
- The sorting is performed entirely on the GPU, with only the final sorted array transferred back to the host.
35+
36+
## Optimization Techniques
37+
38+
While this implementation demonstrates the basic structure of a CUDA merge sort, several optimization techniques could be applied:
39+
40+
1. Shared Memory: Utilize shared memory for faster access to frequently used data within a thread block.
41+
2. Coalesced Memory Access: Optimize global memory access patterns for better performance.
42+
3. Dynamic Parallelism: Use CUDA's dynamic parallelism to launch child kernels from within the device code, potentially improving the parallelization of the recursive steps.
43+
4. Hybrid Approach: Combine GPU parallelism for large-scale divisions with CPU processing for smaller subarrays.
44+
45+
## Execution
46+
![Merge](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/blob/main/Chapter4-IntermediateAlgo/merge.png)
47+
48+
## Conclusion
49+
50+
This CUDA implementation of merge sort demonstrates how a classic sorting algorithm can be adapted for parallel execution on a GPU. It serves as a foundation for understanding more complex parallel algorithms and optimization techniques in CUDA programming.

‎Chapter4-IntermediateAlgo/merge.png

22.7 KB
Loading[フレーム]

‎Chapter4-IntermediateAlgo/merge_sort.cu

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
#include <iostream>
2+
#include <cstdlib>
3+
4+
// Merge two sorted arrays
5+
__device__ void merge(int* arr, int* temp, int left, int mid, int right) {
6+
int i = left;
7+
int j = mid + 1;
8+
int k = left;
9+
10+
while (i <= mid && j <= right) {
11+
if (arr[i] <= arr[j])
12+
temp[k++] = arr[i++];
13+
else
14+
temp[k++] = arr[j++];
15+
}
16+
17+
while (i <= mid)
18+
temp[k++] = arr[i++];
19+
while (j <= right)
20+
temp[k++] = arr[j++];
21+
22+
for (int idx = left; idx <= right; ++idx)
23+
arr[idx] = temp[idx];
24+
}
25+
26+
27+
// Recursive merge sort
28+
__device__ void mergeSortRecursive(int* arr, int* temp, int left, int right) {
29+
if (left < right) {
30+
int mid = left + (right - left) / 2;
31+
mergeSortRecursive(arr, temp, left, mid);
32+
mergeSortRecursive(arr, temp, mid + 1, right);
33+
merge(arr, temp, left, mid, right);
34+
}
35+
}
36+
37+
// Kernel function to start the merge sort
38+
__global__ void mergeSort(int* arr, int* temp, int N) {
39+
mergeSortRecursive(arr, temp, 0, N - 1);
40+
}
41+
42+
int main() {
43+
const int N = 10; // Number of elements
44+
int arr[N] = {9, 3, 7, 1, 5, 8, 2, 4, 6, 0};
45+
int* d_arr;
46+
int* d_temp;
47+
48+
cudaMalloc(&d_arr, N * sizeof(int));
49+
cudaMalloc(&d_temp, N * sizeof(int));
50+
cudaMemcpy(d_arr, arr, N * sizeof(int), cudaMemcpyHostToDevice);
51+
52+
mergeSort<<<1, 1>>>(d_arr, d_temp, N);
53+
54+
cudaMemcpy(arr, d_arr, N * sizeof(int), cudaMemcpyDeviceToHost);
55+
56+
std::cout << "Sorted array: ";
57+
for (int i = 0; i < N; ++i)
58+
std::cout << arr[i] << " ";
59+
std::cout << std::endl;
60+
61+
cudaFree(d_arr);
62+
cudaFree(d_temp);
63+
64+
return 0;
65+
}

‎README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,4 @@
6969
- [x] [Chapter 1: Basics of CUDA Programming](Chapter1-Basics)
7070
- [x] [Chapter 2: Basic Syntax](Chapter2-BasicSyntax)
7171
- [x] [Chapter 3: Simple CUDA Vector Addition](Chapter3-EasyCudaProject)
72-
72+
-[x][Chapter 4: Intermediate Algorithms](Chapter4-IntermediateAlgo)

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /