Code of Honour

Wednesday, 10 June 2015

JSON Patches in REST

A JSON patch is exactly what its name suggests: a patch for a JSON document, i.e., another JSON doc describing a set of changes which are to be applied to the original JSON. JSON patch follows the RFC 6902 specification. The JSON patch specification can be readily combined with a rest interface so that clients can update their resource in part without the need to apply a PUT to re-upload the whole JSON. In this post we'll go through an application case: a RESTful web service for bibliographic entries.

JAQPOT: A predictive toxicology platform

JAQPOT Quattro is a predictive toxicology platform on the web which combines machine learning, cheminformatics, linked data and web technologies with the OpenTox and eNanomapper APIs to deliver a generic and interoperable solution for drug discovery. JAQPOT Quattro acts as a computational backend which can be used by user interfaces to provide access to QSAR and other functionality. At the same time, it can be used as a backend for developers and data scientists to perform computations and experiment will data that are available on the web.

How to interrupt a message-driven bean

So, say you have started an asynchronous task running in a Java Enterprise (JEE) container. This is often done using the interface MessageListener and its method onMessage(Message) along with Topics. In brief, jobs are announced on a Topic (that is a list of jobs) where one or more listeners subscribe and check and every time a new job arrives they do some work. A Topic is like a TODO list while a MessageListener offers a thread while will undertake the announced job. However, while some thread executes such a job, we may decide that we want to stop the execution. In this article we will explain how such a thread can be interrupted.

Templated C++ classes with friend functions and implementation-header separation

In the last two posts we saw how to call a constructor from another constructor using friend functions and how to separate the implementation from the header when C++ templates are used. In this post, I'm explaining how to do both, i.e., to use friend functions and templates with the implementation in a separate file from the header. One would expect that this is a pretty straightforward extensions, but, unfortunately, there are a few details that need to be carefully considered.

Templates in C++ often force the developer to either merge the definition of the function prototypes and the implementation in a single header file (which is awkward), or to #include the implementation file in their application (i.e., do something like #include "MyClass.cpp", which is even more awkward!). The use of extern and export has been introduced in C++11, but here I'm going to present a neat way to deal with this peculiarity through an easy example.

C++: calling a constructor from within a constructor

Constructors within constructors? This is business as usual for the Java programmer, but what about C++? Unfortunately, it turns out that that's not so straightforward a task... In this post we are going to work this around with a simple example: a class which can be used to read data from text files. The key in this approach answers to the name of friend function! A friend function is a function which (i) is not a member function of a class, but (ii) has access to the private fields of objects that are passed to it as arguments.

CUDA and pointers to pointers

Say you have a CUDA kernel that operates on a number of different arrays d_A1, d_A2, ..., d_An which are not stored in the device memory consecutively, i.e., there is not a single array d_A which stores the elements of d_A1, d_A2, ..., d_An in some particular order. If one needs to pass all these arrays to the kernel, the pointers to pointers do the job. In this post we'll see how to use pointers to pointers in CUDA C.

High-performance matrix-vector multiplication in CUDA C

Here I present a custom kernel for matrix-vector multiplication written in CUDA C and some benchmarking results on a Tegra K1 (on a Jetson TK1 development board) and comparison to cuBLAS's function cublasSgemv. This is an open-source project which is hosted on github. This post comes, as I promised, as a sequel of an older post about matrix-vector multiplication in CUDA using shared memory. Further optimisation of the kernel is also possible (there are a few ideas), but for the time being I'm presenting some encouraging results...

Timing CUDA kernels

Are you up to benchmarking your CUDA application? Are you looking for the easiest possible way on earth to time your kernels? Then, you're at the right place because in this post we're going through some code snippets to help you out. The implementation I present today is inspired by MATLAB's tic-toc simple interface. You start the timer with a tic() and get the elapsed time with a toc(). This saves a lot of lines of code and keeps you code simpler and tidy.

CUDA C - Tutorials and other resources

This is a collection of tutorials, blogs, articles and other resources for CUDA C that I hope you'll find useful. Feel free to contribute with a comment what you think can help people learn CUDA and optimise their code. I am planning to keep this post continuously updated, so stay tuned. Most of the links hosted here point to free resources.

Matrix-vector multiplication using shared memory

Matrix-vector multiplications, as well as matrix-matrix ones are essential for any sort numeric computations. GPGPUs alllow massive parallelisation of such operations. This post is about doing matrix-vector multiplications using CUDA with shared memory; a type of on-chip memory that is much faster than the global memory of the device - actually, shared memory is as high as 100 times faster than global memory provided that there are no bank conflicts between the threads. Algebraic operations such as matrix-vector multiplications can benefit a lot from the use of shared memory as we will show in this post.

Static allocation of device vars

This is a brief post on how to allocate device memory statically using the __device__ keyword. We explain how one can statically allocate memory space on the device and initialize it as with standard C code. This memory is the available to the device but cannot be directly referenced from the host (unless cudaGetSymbolAddress is used as discussed in a previous post).

Memories from CUDA - Pinned memory (III)

The main motivation for using pinned memory is to perform asynchronous transfers of data from the host to the device. This is accomplished using cudaMemcpyAsync and related functions. Additionally, certain performance benefits come with pinned (or page-locked) memory and additional performance benefit can be obtained by using write-combined memory in certain cases. In this post we give a few examples about how to allocate pinned memory and we investigate its features.

Memories from CUDA - Symbol Addresses (II)

In a previous post we gave a simple example of accessing constant memory in CUDA from inside a kernel function. What if we need to access it from the host (i.e., pass it as an argument to a kernel)? In this post we focus on how to use cudaGetSymbolAddress to get the address of a device variable (can be a __constant__ or a __device__). Nowadays, modern architectures support the keyword __managed__, but we're going to do things the old way...

This is a post about all of us who feel we can't memorize all these different types of CUDA memories. This is the first post from a series of upcoming posts on memory management in CUDA. Here, we present constant memory and we explain how it can be accessed from the the device through a step-by-step comprehensive example.

Code of Honour

Pages

Wednesday, 10 June 2015

JSON Patches in REST

Saturday, 6 June 2015

JAQPOT: A predictive toxicology platform

Monday, 30 March 2015

How to interrupt a message-driven bean

Sunday, 2 November 2014

Templated C++ classes with friend functions and implementation-header separation

C++ Templates: Separate definition and implementation

Saturday, 1 November 2014

C++: calling a constructor from within a constructor

Tuesday, 21 October 2014

CUDA and pointers to pointers

Monday, 20 October 2014

High-performance matrix-vector multiplication in CUDA C

Sunday, 19 October 2014

Timing CUDA kernels

Friday, 17 October 2014

CUDA C - Tutorials and other resources

Tuesday, 14 October 2014

Matrix-vector multiplication using shared memory

Monday, 13 October 2014

Static allocation of device vars

Saturday, 11 October 2014

Memories from CUDA - Pinned memory (III)

Friday, 10 October 2014

Memories from CUDA - Symbol Addresses (II)

Memories from CUDA - Constant memory (I)