Some of the answers to this question deal with older versions of Cassandra. The correct answer for this kind of problem depends on the version of
This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.
I would encode lists in the column key, using composite columns with the real column name as the first dimension, ie:
row_key -> {
[column_name; entry1] -> "",
[column_name; entry2] -> "",
...
}
Then, to read the list, you would need to do a get_slice from [column_name; ] to [column_name; ] - note the empty dimensions.
The great thing about this is it actually implements a set quite nicely; the list cannot contains the same thing twice. I think thins works in your usecase. The list would also be maintained in sorted order.
In older versions of Cassandra, you had to serialize the list yourself and store it in a column, or perhaps use a super column.
Since version 1.2 of Cassandra, CQL3 has collection types for columns, so you can give list<text>
as the type of a column in your schema. For example:
CREATE TABLE Person (
name text,
skills list<text>,
PRIMARY KEY (name)
);
Or you could use set<text>
if you want to automatically eliminate duplicates.
This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.
As mentioned on the mailing list, my preference which has worked very well for me, is to store a single column "skills" with the value being a serialized JSON string.
Really comes down to the usage patterns you have for "skills".