How to retrieve only the information that got changed from Cassandra?

后端 未结 1 1382
既然无缘
既然无缘 2021-01-26 17:41

I am working on designing the Cassandra Column Family schema for my below use case.. I am not sure what is the best way to design the cassandra column family for my below use ca

1条回答
  •  时光说笑
    2021-01-26 18:11

    Regarding 1) Cassandra is used for heavy writing, lots of data on multiple nodes. To retrieve ALL data from this kind of set-up is daring since this might involve huge amounts that have to be handled by one client. A better approach would be to use pagination. This is natively supported in 2.0.

    Regarding 2) The point is that partition keys only support EQ or IN queries. For LT or GT (< / >) you use column keys. So if it makes sense to group your entries by some ID like "type", you can use this for your partition key, and a timeuuid as a column key. This allows to query for all entries newer than X like so

    create table test 
      (type int, SCHEMA_ID int, RECORD_NAME text, 
      SCHEMA_VALUE text, TIMESTAMP timeuuid, 
      primary key (type, timestamp));
    
    select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;
    

    Update:

    You asked:

    somebody can insert same SCHEMA_ID twice? Am I correct?

    Yes, you can always make an insert with an existing primary key. The values at that primary key will be updated. Therefore, to preserve uniqueness, a UUID is often used in the primary key, for instance, timeuuid. It is a unique value containing a timestamp and the MAC address of the client. There is excellent documentation on this topic.

    General advice:

    1. Write down your queries first, then design your model. (Use case!)
    2. Your queries define your data model which in turn is primarily defined by your primary keys.

    So, in your case, I'd just adapt my schema above, like so:

    CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT,   
    LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));
    

    Which allows this query:

    select * from test where RECORD_NAME IN ("componentA","componentB")
      and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66;
    
    the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT
    so you would fetch everything after that
    

    0 讨论(0)
提交回复
热议问题