Storing a list of values in Cassandra

后端 未结 3 1040
感情败类
感情败类 2021-01-13 08:46

Version Dependent

Some of the answers to this question deal with older versions of Cassandra. The correct answer for this kind of problem depends on the version of

相关标签:
3条回答
  • 2021-01-13 08:58

    This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.


    I would encode lists in the column key, using composite columns with the real column name as the first dimension, ie:

    row_key -> {
         [column_name; entry1] -> "",
         [column_name; entry2] -> "",
         ... 
    }
    

    Then, to read the list, you would need to do a get_slice from [column_name; ] to [column_name; ] - note the empty dimensions.

    The great thing about this is it actually implements a set quite nicely; the list cannot contains the same thing twice. I think thins works in your usecase. The list would also be maintained in sorted order.

    0 讨论(0)
  • 2021-01-13 09:00

    In older versions of Cassandra, you had to serialize the list yourself and store it in a column, or perhaps use a super column.

    Since version 1.2 of Cassandra, CQL3 has collection types for columns, so you can give list<text> as the type of a column in your schema. For example:

     CREATE TABLE Person (
        name text,
        skills list<text>,
        PRIMARY KEY (name)
     );
    

    Or you could use set<text> if you want to automatically eliminate duplicates.

    0 讨论(0)
  • 2021-01-13 09:04

    This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.


    As mentioned on the mailing list, my preference which has worked very well for me, is to store a single column "skills" with the value being a serialized JSON string.

    Really comes down to the usage patterns you have for "skills".

    • If "skills" are just for CRUD on a per user basis, this is fine.
    • If you want to be able to search for all users that have a skill of "cobol", then I would still recommend this approach and have another row that is skill:cobol that has a column of UUID and a value of timestamp or something similar ...
    • I'm sure with Pig/Hadoop integration to your cassandra nodes, you could also still quite happily query all of the users that have x,y and z to generate new data to support additional use cases.
    0 讨论(0)
提交回复
热议问题