Friday, February 22, 2013

Cassandra Comparators and Validators


About Data Types (Comparators and Validators)

In a relational database, you must specify a data type for each column when you define a table. The data type constrains the values that can be inserted into that column. For example, if you have a column defined as an integer datatype, you would not be allowed to insert character data into that column. Column names in a relational database are typically fixed labels (strings) that are assigned when you define the table schema.

In Cassandra, the data type for a column (or row key) value is called a validator. The data type for a column name is called a comparator. You can define data types when you create your column family schemas (which is recommended), but Cassandra does not require it. Internally, Cassandra stores column names and values as hex byte arrays (BytesType). This is the default client encoding used if data types are not defined in the column family schema (or if not specified by the client request).

Cassandra comes with the following built-in data types, which can be used as both validators (row key and column value data types) or comparators (column name data types). One exception is CounterColumnType, which is only allowed as a column value (not allowed for row keys or column names).
Internal Type CQL Name Description
BytesType
blob
Arbitrary hexadecimal bytes (no validation)
AsciiType ascii US-ASCII character string
UTF8Type text, varchar UTF-8 encoded string
IntegerType varint Arbitrary-precision integer
LongType int, bigint 8-byte long
UUIDType uuid Type 1 or type 4 UUID
DateType timestamp Date plus time, encoded as 8 bytes since epoch
BooleanType boolean true or false
FloatType float 4-byte floating point
DoubleType double 8-byte floating point
DecimalType decimal Variable-precision decimal
CounterColumnType counter Distributed counter value (8-byte long)



About Validators
For all column families, it is best practice to define a default row key validator using the key_validation_class property.

For static column families, you should define each column and its associated type when you define the column family using the column_metadata property.

For dynamic column families (where column names are not known ahead of time), you should specify a default_validation_class instead of defining the per-column data types.

Key and column validators may be added or changed in a column family definition at any time. If you specify an invalid validator on your column family, client requests that respect that metadata will be confused, and data inserts or updates that do not conform to the specified validator will be rejected.

About Comparators
Within a row, columns are always stored in sorted order by their column name. The comparator specifies the data type for the column name, as well as the sort order in which columns are stored within a row. Unlike validators, the comparator may not be changed after the column family is defined, so this is an important consideration when defining a column family in Cassandra.

Typically, static column family names will be strings, and the sort order of columns is not important in that case. For dynamic column families, however, sort order is important. For example, in a column family that stores time series data (the column names are timestamps), having the data in sorted order is required for slicing result sets out of a row of columns.


Cassandra CLI


Cassandra CLI

Here I am not going to discuss about cassandra. A am assuming you already know cassanda and you are looking for programming issues.

Installation:
      It is so easy in windows as well ubuntu. Here all commands i tested in ubuntu  12.10 and cassandra version 1.2.1
go to the below link for download and instruction. http://cassandra.apache.org/download/

ubuntu users should follow debian installation process. In ubuntu we need to do public key authentication  You can find lot of keys in their site, but use the latest one, most of them are not working // in my case.

Programming CLI:
For testing purpose averyone will use Cassandra-cli. So here I am giving those commands and suggestions to solve issues.
Before going to this you can check below link once. http://wiki.apache.org/cassandra/CassandraCli
1. Type cassandra-cli in terinal

rajesh@rajesh-VPCEG25EN:~$ cassandra-cli
Welcome to Cassandra CLI version 1.0.12
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default@unknown]

2. connect to your cluster

connect localhost/9160;

[default@unknown] connect localhost/9160;
Connected to: "Test Cluster" on localhost/9160

Note:1. In cli every command will end with semi column(“;”).
         2 . some times local host may not works you need to give your ip address instead of localhost
         3. create keyspace

3. create keyspace

 create keyspace <keyspaceName>

[default@unknown] create keyspace rajesh;
14688360-7cfd-11e2-0000-242d50cf1fbc
Waiting for schema agreement...
... schemas agree across the cluster

4. use that keyspace

use <keyspaceName>

[default@unknown] use rajesh;
Authenticated to keyspace: rajesh
[default@rajesh]

5.create column family

create column family account1
with key_validation_class = UTF8Type
and comparator = 'AsciiType'
and default_validation_class = UTF8Type;

[default@rajesh] create column family account1
... with key_validation_class = UTF8Type
... and comparator = 'AsciiType'
... and default_validation_class = UTF8Type;
aff43210-7cfe-11e2-0000-242d50cf1fbc
Waiting for schema agreement...
... schemas agree across the cluster

//In cassandra schema is optional. But we should add schema for better approach. The good thing here is you can update schema whenever you want


5.9 update column metadata

update column family address
with comparator = 'AsciiType'
and key_validation_class = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and column_metadata =
[{column_name : city,
validation_class : utf8},
{column_name : zip,
validation_class : utf8}];

[default@rajesh] create column family 'users';
ce0413c0-7cfd-11e2-0000-242d50cf1fbc
Waiting for schema agreement...
... schemas agree across the cluster

At initial stage I really got confused among comparator and validation classes. If you want to know about them you can find in my blog.

6. create super column family

create column family student
with column_type = 'Super'
and key_validation_class = UTF8Type
and comparator = 'AsciiType'
and default_validation_class = UTF8Type;

The only difference from column family to super column family is you need to add column type = 'super'

7.update super column metadata

update column family student
with comparator = 'AsciiType'
and key_validation_class = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and column_metadata =
[{column_name : city,
validation_class : utf8},
{column_name : zip,
validation_class : utf8}];


8. insert data to column family:

set <columnfamily>['<key>']['<column name>'] = '<column value>';

[default@rajesh] set account['key117']['city'] = 'Hyderabad';
Value inserted.
Elapsed time: 53 msec(s).

Note: if you get errors like below
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' as hex bytes

If you not able to insert data then you need to add following lines
 assume <column family> keys as utf8;

//you can assume whatever you want. Eg; comparator ,validation class, ….etc

9.Insert data to super column family

set <super column family>[<key>][<super column name>][<column name>] = <value>;
[default@rajesh] set student['key1']['user1']['name']='rajesh';
Value inserted.
Elapsed time: 8 msec(s).

If you get error like this
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' as hex bytes
then you need to add few lines

assume <column family name> keys as utf8;
assume <column family name> comparator as utf8;
assume <column family name> validator as utf8;
assume <column family name> sub_comparator as utf8;


10. reading data from cassandra-cli:
if you want to read full column family use

list <column family>;
if you want to read a single key then

get <column family>[<key>];

[default@rajesh] get account[key1];
=> (column=city, value=hyderabad, timestamp=1361466885159000)
=> (column=zip, value=5000012, timestamp=1361466900352000)
Returned 2 results.
Elapsed time: 88 msec(s).

This is it If you want to know any other commands type help
help;
?;

So Friend in my next post you can find java programs on cassndra. All The Best :)
//Sorry for bad alignment. As we know programmers are so lazy

Saturday, February 2, 2013

cassandra


cassandra  referances:

//just for my self
create column family User with comparator = UTF8Type;

update column family User with
        column_metadata =
        [
        {column_name: first, validation_class: UTF8Type},
        {column_name: last, validation_class: UTF8Type},
        {column_name: age, validation_class: UTF8Type, index_type: KEYS}
        ];

assume User keys as utf8;

set User['jsmith']['first'] = 'John';
set User['jsmith']['last'] = 'Smith';
set User['jsmith']['age'] = '38';

get User['jsmith'];

get User where age = '12';

update column family address with column_type = 'Super', column_metadata =
[
{column_name: city, validation_class: UTF8Type},
{column_name: zip, validation_class: UTF8Type}
];


create column family address
with column_type = 'Super'
and comparator = UTF8Type
and default_validation_class = UTF8Type
and column_metadata = [
         {column_name : city, validation_class : UTF8Type}
         {column_name : zip, validation_class : LongType}
];

create column family address with column_type='Super';

set address['users']['parts']['engine']='v8';


[default@testkeyspace] assume address comparator as ascii;
[default@testkeyspace] assume address sub_comparator as ascii;
[default@testkeyspace] assume address validator as ascii;

get address['key1'];

http://my.safaribooksonline.com/9781849515122/ch02