Monday, 21 May 2012

Cassandra and Hector #1

Hector and Andromache
by Giorgio de Chirico, 1917
Galleria Nazionale d'Arte Moderna
Rome, Italy, Oil on canvas
Following the previous post on Apache Cassandra, this one will serve as a step-by-step beginner's guide to Hector; a Java API for accessing the Cassandra Key-Value Store. [It is assumed that you have already installed Cassandra, version 1.1.0.]. We will learn how to create a Keyspace and a Column Family, how to register data and retrieve values given their key.

Thrift

Thrift is a client API to the Cassandra database. Thrift is a low-level API not intended to be used directly by developers unless they need to build their own library for accessing Cassandra (for example for a not-supported programming language). In this tutorial we'll be using Hector - a high-level API for Java; but before that we need to install Thrift. For that download version 0.8.0 from http://thrift.apache.org/download/ or from the terminal:
$ wget "http://www.apache.org/dyn/closer.cgi?path=/thrift/0.8.0/thrift-0.8.0.tar.gz"

Then extract the file, cd inside the created directory:

$ tar -xvf thrift-0.8.0.tar.gr

Then, we need to configure, compile and install thrift:

$ cd thrift-0.8.0/
$ ./configure
$ make
$ sudo make install

It has been reported that in some cases Τhrift does not compile for Erlan. If this is the case for you, configure with --without-rel.

Getting Started with Hector

Hector is a Java API for Cassandra using Thrift as a mediator. A must-read document for beginners is this short manual. Great confusion share the friends of Ant who have not read the paragraph "Getting Started" in this manual. C heck out which particular jar files you need to include in your classpath otherwise you are in for surprises...

Here is a list of the jars I have in my classpath:

cassandra-thrift-1.1.0
commons-codec-1.4
commons-lang-2.4
commons-logging-1.1.1
libthrift-0.8.0
slf4j-api- 1.6.1
slf4j-log4j12-1.5.8
google-collections-1.0
hector-core-1.0-2

Creating a Keyspace

First thing to do when starting with a fresh and clean Cassandra database is to create a Keyspace. We already explained how to create a keyspace using the Cassandra client in the previous post. From within Java we will be using Hector to create a new Keyspace and a new Column Family with a few columns.

First we need to connect to our cluster using:

Cluster cluster = HFactory.getOrCreateCluster("test-cluster", "localhost:9160");

Then, we may need to drop a keyspace with the same name in case it already exists using:

String KEYSPACE_NAME = "CodeOfHonour";
if (cluster.describeKeyspace(KEYSPACE_NAME) != null) {
   cluster.dropKeyspace(KEYSPACE_NAME);
}

Apart from that, if the keyspace is already in the database and we want to delete it, it's good to also remove the respective files under /var/lib/cassandra/ as a Spring clean. Otherwise, there are various things that can go wrong and have unexpected Exceptions. However, be very careful with using dropKeyspace because it may lead to irrecoverable loss of data. It is good to learn from the first day how to backup and restore our keyspaces.

Define a Column Family

Let's now venture an anatomy of a Column Family definition. We need to create a CF inside our keyspace called User.

// Definition of a Column Family
   new BasicColumnFamilyDefinition();
cfDfn.setKeyspaceName(KEYSPACE_NAME);
cfDfn.setName("User");
cfDfn.setKeyAlias(
   new StringSerializer().toByteBuffer("ID"));

This is a very basic definition of a CF in which we have only indicated that its name is User and that it belongs to the given keyspace. We have also indicated that the Key of this CF is identified by the String ID. It is good to customize our CF a bit more. First off, we need to specify the validator class (counterpart of the type in SQL terms) of our keys and the default validation class for the columns. Let us assume that this will be UTF8-encoded text.

cfDfn.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());  cfDfn.setComparatorType(ComparatorType.UTF8TYPE);    cfDfn.setDefaultValidationClass(ComparatorType.UTF8TYPE.getClassName());

Finally, we tweak some more parameters to achieve the desired result. For example we increase the period between successive garbage collections to avoid poor performance at the expense of higher demand for memory. We also set the read repair chance to 0.1. Setting it to 1 will force Cassandra to double-check every row we read. If the consistency value is set to 1 or higher then read repair chance is fixed to 1.

cfDfn.setGcGraceSeconds(864000);
cfDfn.setReadRepairChance(0.1);

We then "cast" (quoted because it's not really type-casting) our CF definition as an instance of ThriftCfDef:

ColumnFamilyDefinition thriftCfDef =
   new ThriftCfDef(columnFamilyDefinition);

Finally, we need to add this ColumnFamilyDefinition object to the keyspace definition.

int replication_factor = 1;
String stratefy = "org.apache.cassandra.locator.SimpleStrategy";
KeyspaceDefinition keyspaceDefinition =   HFactory.createKeyspaceDefinition(KEYSPACE_NAME, 
    strategy, 1, Arrays.asList(thriftCfDef));

Adding Columns

We now need to create a set of columns and add them to the column family we have already created:

String name = "firstName";
BasicColumnDefinition colDef = new BasicColumnDefinition();
StringSerializer ser = new StringSerializer();
colDef.setName(ser.toByteBuffer(name));                colDef.setValidationClass(ComparatorType.UTF8TYPE.getClassName());

and we now add this column to the above BasicColumnFamilyDefinition object as follows:

cfDfn.addColumnDefinition(columnDefinition);

What is the same we can create more columns and add them to this column family. So far no transaction have been made to the database. The whole schema will be created on the following command:

cluster.addKeyspace(keyspaceDefinition);

Coming Soon

There's a lot to say about Cassandra and Hector. In this post I presented how one can create a DB schema with Cassandra. In my previous post I made a short introduction into Cassandra and provided a step-by-step guide on downloading and installing Cassandra on your Linux machine. There will soon be another one on how to write to and read from a Cassandra DB and a series of success stories with Cassandra. Stay tuned!

No comments:

Post a Comment