Tuesday 21 October 2014

CUDA and pointers to pointers

Say you have a CUDA kernel that operates on a number of different arrays d_A1, d_A2, ..., d_An which are not stored in the device memory consecutively, i.e.,  there is not a single array d_A which stores the elements of d_A1, d_A2, ..., d_An in some particular order. If one needs to pass all these arrays to the kernel, the pointers to pointers do the job. In this post we'll see how to use pointers to pointers in CUDA C.
In CUDA C Programming device variables, in most cases, are referenceable from the host, i.e., the host knows their address on the device. Even if a variable has  been declared as __constant__, or __device__, still the host can have a device pointer for it to, eventually, pass it to a kernel function and ask from the GPU to do things with it. The host can, therefore, create arrays of such addresses and store them host side. If such arrays of addresses need to be passed to a kernel, they can be initialized in the pinned memory which the device can access directly. Read more about pinned memory in this post.


Pointers to pointers of device variables. Read more about pointers to pointers here.
In the following example we allocate space on the host using malloc for h_A and h_B. We then allocate equal space on the device memory and get the pointers d_A and d_B which point to these memory positions (also explained in the figure above). Variable hst_ptr resides in the host memory and is defined to be page-locked, so it is accessible from the device directly (without the need to issue an extra cudaMemcpy).

Let us now take a look at the code:


The above program will output:
 Addresses...  
 dX    = 0xac720000  
 dA    = 0xac620000  
 dB    = 0xac620200  
 dX[0] = 0xac620000  
 dX[0] = 0xac620200  

 Values...  
 A[0] = 1.000000  
 A[1] = 2.000000  
 A[2] = 3.000000  
 A[3] = 4.000000  
 A[4] = 5.000000  
 A[5] = 6.000000  
 A[6] = 7.000000  
 A[7] = 8.000000  
 A[8] = 9.000000  
 A[9] = 10.000000  
 B[0] = 20.000000  
 B[1] = 21.000000  
 B[2] = 22.000000  
 B[3] = 23.000000  
 B[4] = 24.000000  
 B[5] = 25.000000  
 B[6] = 26.000000  
 B[7] = 27.000000  
 B[8] = 28.000000  
 B[9] = 29.000000  




No comments:

Post a Comment