Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

前端 未结 2 1656
情话喂你
情话喂你 2021-02-02 13:39

We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table

2条回答
  •  孤独总比滥情好
    2021-02-02 14:29

    select * from journeydetails where bucketid in('2015-12') and vehicleid in('1234567')
      and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
      and travelduration > 1800000;
    

    That's not going to work. The reason goes back to how Cassandra stores data on-disk. The idea with Cassandra is that it is very efficient at returning a single row with a precise key, or at returning a continuous range of rows from the disk.

    Your rows are partitioned by bucketid, and then sorted on disk by vehicleid, starttime, and travelduration. Because you are already executing a range query (non-EQ relation) on starttime, you cannot restrict the key that follows. This is because the travelduration restriction may disqualify some of the rows in your range condition. This would result in an inefficient, non-continuous read. Cassandra is designed to protect you from writing queries (such as this), which may have unpredictable performance.

    Here are two alternatives:

    1- If you could restrict all of your key columns prior to travelduration (with an equals relation), then you could apply a your greater-than condition:

    select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
      and starttime='2015-12-1 00:00:00' and travelduration > 1800000;
    

    Of course, restricting on an exact starttime may not be terribly useful.

    2- Another approach would be to omit travelduration altogether, and then your original query would work.

    select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
      and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59';
    

    Unfortunately, Cassandra does not offer a large degree of query flexibility. Many people have found success using a solution like Spark (alongside Cassandra) to achieve this level of reporting.

    And just a side note, but don't use IN unless you have to. Querying with IN is similar to using a secondary index, in that Cassandra has to talk to several nodes to satisfy your query. Calling it with a single item probably isn't too big of a deal. But IN is one of those old RDBMS habits that you should really break before getting too deep into Cassandra.

提交回复
热议问题