Hector and Andromache by Giorgio de Chirico, 1917 Galleria Nazionale d'Arte Moderna Rome, Italy, Oil on canvas |
Thrift
Thrift is a client API to the Cassandra database. Thrift is a low-level API not intended to be used directly by developers unless they need to build their own library for accessing Cassandra (for example for a not-supported programming language). In this tutorial we'll be using Hector - a high-level API for Java; but before that we need to install Thrift. For that download version 0.8.0 from http://thrift.apache.org/download/ or from the terminal:
$ wget "http://www.apache.org/dyn/closer.cgi?path=/thrift/0.8.0/thrift-0.8.0.tar.gz"
Then extract the file, cd inside the created directory:
$ tar -xvf thrift-0.8.0.tar.gr
Then, we need to configure, compile and install thrift:
$ cd thrift-0.8.0/
$ ./configure
$ make
$ sudo make install
It has been reported that in some cases Τhrift does not compile for Erlan. If this is the case for you, configure with --without-rel.
Here is a list of the jars I have in my classpath:
First we need to connect to our cluster using:
Then, we may need to drop a keyspace with the same name in case it already exists using:
Apart from that, if the keyspace is already in the database and we want to delete it, it's good to also remove the respective files under /var/lib/cassandra/ as a Spring clean. Otherwise, there are various things that can go wrong and have unexpected Exceptions. However, be very careful with using dropKeyspace because it may lead to irrecoverable loss of data. It is good to learn from the first day how to backup and restore our keyspaces.
We then "cast" (quoted because it's not really type-casting) our CF definition as an instance of ThriftCfDef:
Finally, we need to add this ColumnFamilyDefinition object to the keyspace definition.
String name = "firstName";
Getting Started with Hector
Hector is a Java API for Cassandra using Thrift as a mediator. A must-read document for beginners is this short manual. Great confusion share the friends of Ant who have not read the paragraph "Getting Started" in this manual. C heck out which particular jar files you need to include in your classpath otherwise you are in for surprises...Here is a list of the jars I have in my classpath:
cassandra-thrift-1.1.0
commons-codec-1.4
commons-lang-2.4
commons-logging-1.1.1
libthrift-0.8.0
slf4j-api- 1.6.1
slf4j-log4j12-1.5.8
google-collections-1.0
hector-core-1.0-2
Creating a Keyspace
First thing to do when starting with a fresh and clean Cassandra database is to create a Keyspace. We already explained how to create a keyspace using the Cassandra client in the previous post. From within Java we will be using Hector to create a new Keyspace and a new Column Family with a few columns.First we need to connect to our cluster using:
Then, we may need to drop a keyspace with the same name in case it already exists using:
String KEYSPACE_NAME = "CodeOfHonour";
if (cluster.describeKeyspace(KEYSPACE_NAME) != null) {
cluster.dropKeyspace(KEYSPACE_NAME);
}
Apart from that, if the keyspace is already in the database and we want to delete it, it's good to also remove the respective files under /var/lib/cassandra/ as a Spring clean. Otherwise, there are various things that can go wrong and have unexpected Exceptions. However, be very careful with using dropKeyspace because it may lead to irrecoverable loss of data. It is good to learn from the first day how to backup and restore our keyspaces.
Define a Column Family
Let's now venture an anatomy of a Column Family definition. We need to create a CF inside our keyspace called User.
// Definition of a Column Family
BasicColumnFamilyDefinition cfDfn =
new BasicColumnFamilyDefinition();
cfDfn.setKeyspaceName(KEYSPACE_NAME);
cfDfn.setName("User");
cfDfn.setKeyAlias(
new StringSerializer().toByteBuffer("ID"));
This is a very basic definition of a CF in which we have only indicated that its name is User and that it belongs to the given keyspace. We have also indicated that the Key of this CF is identified by the String ID. It is good to customize our CF a bit more. First off, we need to specify the validator class (counterpart of the type in SQL terms) of our keys and the default validation class for the columns. Let us assume that this will be UTF8-encoded text.
cfDfn.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName()); cfDfn.setComparatorType(ComparatorType.UTF8TYPE); cfDfn.setDefaultValidationClass(ComparatorType.UTF8TYPE.getClassName());
Finally, we tweak some more parameters to achieve the desired result. For example we increase the period between successive garbage collections to avoid poor performance at the expense of higher demand for memory. We also set the read repair chance to 0.1. Setting it to 1 will force Cassandra to double-check every row we read. If the consistency value is set to 1 or higher then read repair chance is fixed to 1.
cfDfn.setGcGraceSeconds(864000);
cfDfn.setReadRepairChance(0.1);
We then "cast" (quoted because it's not really type-casting) our CF definition as an instance of ThriftCfDef:
ColumnFamilyDefinition thriftCfDef =
new ThriftCfDef(columnFamilyDefinition);Finally, we need to add this ColumnFamilyDefinition object to the keyspace definition.
int replication_factor = 1;
String stratefy = "org.apache.cassandra.locator.SimpleStrategy";
KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(KEYSPACE_NAME,
strategy, 1, Arrays.asList(thriftCfDef));
Adding Columns
We now need to create a set of columns and add them to the column family we have already created:String name = "firstName";
BasicColumnDefinition colDef = new BasicColumnDefinition();
StringSerializer ser = new StringSerializer();
and we now add this column to the above BasicColumnFamilyDefinition object as follows:
What is the same we can create more columns and add them to this column family. So far no transaction have been made to the database. The whole schema will be created on the following command:
colDef.setName(ser.toByteBuffer(name)); colDef.setValidationClass(ComparatorType.UTF8TYPE.getClassName());
and we now add this column to the above BasicColumnFamilyDefinition object as follows:
cfDfn.addColumnDefinition(columnDefinition);
What is the same we can create more columns and add them to this column family. So far no transaction have been made to the database. The whole schema will be created on the following command:
cluster.addKeyspace(keyspaceDefinition);
No comments:
Post a Comment