Code of Honour: Static allocation of __device_

Monday, 13 October 2014

Static allocation of device vars

This is a brief post on how to allocate device memory statically using the __device__ keyword. We explain how one can statically allocate memory space on the device and initialize it as with standard C code. This memory is the available to the device but cannot be directly referenced from the host (unless cudaGetSymbolAddress is used as discussed in a previous post).

Device-side static allocation

To get straight to the point, here's the code:

Let us now go through the code and make some remarks. First device memory for variable dev_static is allocated statically, no cudaMemcpy has been used to transfer the data, no cudaMalloc was used to allocate space on the device. This of course assumes that we know beforehand the dimension of the array we need to allocate.

__device__ float dev_static[NS] = {10.0, -20.0, 235.0};

Now let's look at the kernel:

__global__ void kernel(float *dx) {
 int tid =threadIdx.x + blockIdx.x * blockDim.x;
 dx[tid] += dev_static[tid];
}

The __device__ variable dev_static is accessible from inside the kernel (but not from the host). It wouldn't be possible to pass the address of dev_static directly to the kernel (although there are ways to do so if necessary using cudaGetSymbolAddress).

Then, we have the kernel invocation:

kernel<<<1, NS>>>(dev_p);

where we pass the address of a variable that as been allocated on the device linear memory using cudaMalloc and that has been initialized using cudaMemcpy (the very standard way). As you may have guessed, the program prints:

host_p[0] = 11
host_p[1] = -18
host_p[2] = 238

Example with pinned memory

Just for fun, you can do the same with pinned memory. The code can be found here and goes like this:

Example with cuBLAS

In this example we allocate two variables, dev_static_x and dev_static_y on the device using the __device__ keyword and we use cudaGetSymbolAddress to retrieve their device address which we then pass to a cuBLAS function to compute the dot product of the two variables on the device.

Code of Honour

Pages

Monday, 13 October 2014

Static allocation of device vars

Device-side static allocation

Example with pinned memory

Example with cuBLAS

No comments:

Post a Comment

Pages

Monday, 13 October 2014

Static allocation of __device__ vars

Device-side static allocation

Example with pinned memory

Example with cuBLAS

No comments:

Post a Comment

Static allocation of device vars