This is a brief post on how to allocate device memory statically using the __device__ keyword. We explain how one can statically allocate memory space on the device and initialize it as with standard C code. This memory is the available to the device but cannot be directly referenced from the host (unless cudaGetSymbolAddress is used as discussed in a previous post).Device-side static allocation
To get straight to the point, here's the code:Let us now go through the code and make some remarks. First device memory for variable dev_static is allocated statically, no cudaMemcpy has been used to transfer the data, no cudaMalloc was used to allocate space on the device. This of course assumes that we know beforehand the dimension of the array we need to allocate.
__device__ float dev_static[NS] = {10.0, -20.0, 235.0};
Now let's look at the kernel:
__global__ void kernel(float *dx) { int tid =threadIdx.x + blockIdx.x * blockDim.x; dx[tid] += dev_static[tid]; }
The __device__ variable dev_static is accessible from inside the kernel (but not from the host). It wouldn't be possible to pass the address of dev_static directly to the kernel (although there are ways to do so if necessary using cudaGetSymbolAddress).
Then, we have the kernel invocation:
kernel<<<1, NS>>>(dev_p);
where we pass the address of a variable that as been allocated on the device linear memory using cudaMalloc and that has been initialized using cudaMemcpy (the very standard way). As you may have guessed, the program prints:
host_p[0] = 11 host_p[1] = -18 host_p[2] = 238


No comments:
Post a Comment