Contiguous Memory Allocation on GPU

Asked 13 years, 4 months ago

Viewed 3k times

Does cudaMalloc allocate contiguous chunks of memory (i.e., physical bytes next to each other)?

I have a piece of CUDA code that simply copies 128 bytes from global device memory to shared memory, using 32 threads. I am trying to find a way to guarantee that this transfer can be completed in one memory transaction of 128 byes. If cudaMalloc allocates contiguous memory blocks, then it can be easily done.

Following is the code:

#include <iostream>
using namespace std;
#define SIZE 32 //SIZE of the array to store in shared memory 
#define NUMTHREADS 32
__global__ void copy(uint* memPointer){
 extern __shared__ uint bits[];
 int tid = threadIdx.x;
 bits[tid] = memPointer[tid];
}
int main(){
 uint inputData[SIZE];
 uint* storedData;
 for(int i=0;i<SIZE;i++){
 inputData[i] = i;
 }
 cudaError_t e1=cudaMalloc((void**) &storedData, sizeof(uint)*SIZE);
 if(e1 == cudaSuccess){
 cudaError_t e3= cudaMemcpy(storedData, inputData, sizeof(uint)*SIZE, cudaMemcpyHostToDevice);
 if(e3==cudaSuccess){
 copy<<<1,NUMTHREADS, SIZE*4>>>(storedData);
 cudaError_t e6 = cudaFree(storedData);
 if(e6==cudaSuccess){
 }
 else{
 cout << "Error freeing memory storedData" << e6 << endl;
 }
 }
 else{
 cout << "Failed to copy" << " " << e3 << endl;
 }
 }
 else{
 cout << "Failed to allocate memory" << " " << e1 << endl;
 }
 return 0;
}

Improve this question

asked Jul 2, 2012 at 16:01

gmemon's user avatar

gmemon

2,7616 gold badges35 silver badges38 bronze badges

What purpose is that kernel supposed to serve?

talonmies
– talonmies

2012年07月02日 16:25:57 +00:00
Commented Jul 2, 2012 at 16:25
It's part of the larger code in which I perform some operation on the data. I am trying to optimize individual parts of the code.

gmemon
– gmemon

2012年07月02日 16:35:36 +00:00
Commented Jul 2, 2012 at 16:35
If the 128 bytes block is 128 byte aligned then this will be done in one transaction. NVIDIA GPUs have a MMU separate from the CPU MMU. All GPU memory operations are done through the GPUs virtual address space. There is no guarantee that blocks larger than a cache line will be in physically contiguous.

Greg Smith
– Greg Smith

2012年07月03日 01:13:59 +00:00
Commented Jul 3, 2012 at 1:13

Add a comment |

1 Answer 1

Sorted by: Reset to default

Yes, cudaMalloc allocates contiguous chunks of memory. The "Matrix Transpose" example in the SDK (http://developer.nvidia.com/cuda-cc-sdk-code-samples) has a kernel called "copySharedMem" that does almost exactly what you're describing.

Improve this answer

answered Jul 2, 2012 at 17:11

Jonathan A. Gross's user avatar

Jonathan A. Gross

1831 silver badge10 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

Contiguous Memory Allocation on GPU

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related