Apache Cassandra Quick tour

Cassandra is distributed database system. It is donated to Apache open source group by Facebook at 2008.The Cassandra is based on Google Big Table data model and Facebook Dynamo distributed architecture. It doesn’t use SQL and optimized to high scale size of data & transaction handling. Even though Cassandra is implemented with Java language, other language can use the Cassandra as a client. (It supports Ruby,Perl,Python,Scala,PHP etc).

It is used to High Scale Size SNS like Face book,Digg,Twitter etc. It doesn’t support complex relationship like Foreign Key. It just provides Key & Value relationship like Java Hashmap. It is very easy to install and use.

Let’s look at data model of Cassandra

Data Model

Cassandra is based on google big table data model. It is called “Column DB”. It is totally different from traditional RDBMS.

Column

Column data structure which consists of column name and column value.

{name: "emailAddress", value:"cassandra@apache.org"}
{name:"age" , value:"20"}

Column Family

Column family is set of columns. It is similar to row in RDBMS table. I will explain more detail about difference between Column Family and row in RDBMS later. Column Family has a key which identify each row in data set. Each row has a number of Columns.

For example, one row is

Cassandra = { emailAddress:"casandra@apache.org" , age:"20"}

“Cassandra” is key for the row, and the row has two columns. Keys of the columns are “emailAddress” and “age”. Each column value is “casandra@apache.org” and “20”.

Let’s look at Column Family which has a number of rows.

UserProfile={
Cassandra={ emailAddress:"casandra@apache.org" , age:"20"}
TerryCho= { emailAddress:"terry.cho@apache.org" , gender:"male"}
Cath= { emailAddress:"cath@apache.org" , age:"20",gender:"female",address:"Seoul"}
}

One of interest thing is each row can have different scheme. Cassandra row has “emailAddress” ,”age” column. TerryCho row has “emailAddress”,”gender” column. This characteristic is called as “Schemeless” (Data structure of each row in column family can be different)

KeySpace

Keyspace is logical set of column family for management perspective. It doesn’t impact data structure.

Super Column & Super Column Family

As I mentioned earlier, column value can have a column itself. (Similar to Java Hashtable can have ValueObject class as a ‘Object’ type)

{name:"username"
value: firstname{name:"firstname",value="Terry"}
value: lastname{name:"lastname",value="Cho"}
}

As a same way column family also can have column family like this

UserList={
Cath:{
username:{firstname:"Cath",lastname:"Yoon"}
address:{city:"Seoul",postcode:"1234"}
}
Terry:{
username:{firstname:"Terry",lastname:"Cho"}
account:{bank:"hana",accounted:"1234"}
}
}

UserList column family has two rows with key "Cath" and "Terry". Each of the "Carry" and "Terry" row has two column families – "Cath" row has "username" and "address’ column family, "Terry" row has "username" and "account" column family.

Cassandra Quick Test

Download Cassandra from http://incubator.apache.org/cassandra/ Extract zip file and run bin/cassandra.bat

We will connect Cassandra node with CLI interface. It is located in /bin/cassandra-cli.bat

The default TCP port number is 9160. You can change the port number in "conf/storage-conf.xml"

In "/conf/storage-conf.xml" file, default key space with name "Keyspace1" is defined. Column family type of the Keyspace is like this

Let’s put a new row with key name "Terry" which has Column (key="gender", value="Male")

Share this:

Like Loading...

20 thoughts on “Apache Cassandra Quick tour

Add yours

  1. Hi,

    I have a requirement of using Cassandra in my application. In my application there is one table with lot of data and most of my application uses that table. Due to lot of data,performance of the application is decreasing when i use that table is in Oracle.

    So, I have decided to use the Cassandra database for that one table and all other tables in oracle. Lot of business logic is dependent on that table.

    No my question is, Can I use the Cassandra for a table which has lot of business logic.

    I am unable to implement lot of where clauses for Cassandra database.

    Is there any supporting tool to use Cassandra in an efficient way?

    Please let me know…
    i am in urgency..

    Thanks in advance

    By Mallik

    1. First in case of Cassandra, there is no tools like admin, developer tool kit etc. As i know you have to develop by your self.
      Cassandra is designed for handle huge # of data quickly but it is hard to handle complex or relational data.
      If you have to handle complex business logic i recommend you to use RDBMS with data base partitioning + Data Grid for cache (memcached or Oracle coherence)
      Cheers
      -Terry

  2. Hi Terry.Cho

    I am a newbie to cassandra.
    we are planning to migrate from mysql to cassandra.
    First of all im writing a sample application where in I insert data in to two tables(column families) created in cassandra.
    My question is that i want the data in the tables to be automatically get deleted if it is some ‘n’ days old.(I mean i want only the last ‘x’ days data to be present in the DB).
    Are there any stored procedure kind of stuff in cassandra.

    How to handle this kind of issue?
    Also do we have any trigger kind of support in cassandra.
    Any help in this is greatly appreciated..

  3. Actually iam devoloping a report in Pentaho report Designer, my backend is mysql. now we are migrating to cassandra db. but pentaho report designer access only jdbc supported database, cassandra didnt support JDBC, is they support ODBC ? any other solution there?

  4. Actually iam devoloping reports in pentaho report designer, my database is mysql. now we are migrating our database to cassandra. now i hav one issue , pentaho report designer support only jdbc supported database, cassandra not support jdbc/odbc, any other solution to access cassandra in pentaho report designer?

  5. A column family does not correspond to a a “row” in RDBMS, but to a “table”:
    “In analogy with relational databases, a column family is as a “table”, each key-value pair being a “row”.” (Wikipedia with references)

Leave a comment Cancel reply