Prefer Your Language

Search This Blog

Related Posts

Share This

What is Kernel in CUDA Programming

Posted by Unknown at 07:13 | 4 comments

Basic of CUDA Programming: Part 5


Kernels

CUDA C extends C by allowing the programmer to define C functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C functions.

A kernel is defined using the __global__ declaration specifier and the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<…>>>execution configuration syntax. Each thread that executes the kernel is given a unique thread ID that is accessible within the kernel through the built-in threadIdx variable.


Syntax:

Kernel_Name<<< GridSize, BlockSize, SMEMSize, Stream >>>(arg,..);


Where:
SMEMsize : is the size of Shared Memory at Runtime .
Stream : is a stream on which kernel will execute.

Sample Example:


// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
int i = threadIdx.x; C[i] = A[i] + B[i];
}
int main()
{ ...
// Kernel invocation with N threads
VecAdd<<<1, N>>>(A, B, C);
...

}
Here, each of the N threads that execute VecAdd() performs one pair-wise addition.
You must be wondered, how grid organized in term of block in term of threads; Read this Post


Feel free to comment...

References
CUDA C Programming Guide
CUDA; Nvidia

4 comments:

  1. int* dev_dynamic_ptr;
    cudaMalloc((void**)&dev_dynamic_ptr, dynamic_size);

    Reply Delete
  2. cuda program

    Reply Delete
  3. #include
    #include
    #include

    int main() {
    const int SIZE = 1001; // Array size is 1001 to include 0 to 1000
    const int NUM_THREADS = 10;
    int sum = 0; // For the total sum of the array
    std::vector averages(NUM_THREADS, 0); // To store averages computed by each thread

    // Initialize the array with values 0 to 1000
    std::vector array(SIZE);
    for (int i = 0; i < SIZE; ++i) {
    array[i] = i;
    }

    #pragma omp parallel num_threads(NUM_THREADS)
    {
    int id = omp_get_thread_num(); // Get the thread ID
    int start = id * (SIZE / NUM_THREADS);
    int end = (id + 1) * (SIZE / NUM_THREADS);
    int thread_sum = 0; // Sum for each thread

    for (int i = start; i < end; ++i) {
    thread_sum += array[i];
    }

    float thread_average = static_cast(thread_sum) / (SIZE / NUM_THREADS);
    averages[id] = thread_average; // Store the average for the thread

    #pragma omp atomic
    sum += thread_sum; // Update the global sum atomically
    }

    // Output the result array (averages) to console
    std::cout << "Averages: ";
    for (float avg : averages) {
    std::cout << avg << " ";
    }
    std::cout << std::endl;

    // Output the sum of the whole array
    std::cout << "Sum of array: " << sum << std::endl;

    return 0;
    }

    Reply Delete
  4. ؤتواؤلنالو

    Reply Delete

Help us to improve our quality and become contributor to our blog

[フレーム]

Subscribe to: Post Comments (Atom)
Become a contributor to this blog. Click on contact us tab
Blogger Template by Clairvo

AltStyle によって変換されたページ (->オリジナル) /