Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

前端 未结 2 1657
情话喂你
情话喂你 2021-02-02 13:39

We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table

相关标签:
2条回答
  • 2021-02-02 14:25

    I run into same problem. Instead of playing with the clustered and primary keys, you could simply add an index to the needed columns. With this, you get rid of these pitfalls and you can use whatever column you want in your query.

    0 讨论(0)
  • 2021-02-02 14:29
    select * from journeydetails where bucketid in('2015-12') and vehicleid in('1234567')
      and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
      and travelduration > 1800000;
    

    That's not going to work. The reason goes back to how Cassandra stores data on-disk. The idea with Cassandra is that it is very efficient at returning a single row with a precise key, or at returning a continuous range of rows from the disk.

    Your rows are partitioned by bucketid, and then sorted on disk by vehicleid, starttime, and travelduration. Because you are already executing a range query (non-EQ relation) on starttime, you cannot restrict the key that follows. This is because the travelduration restriction may disqualify some of the rows in your range condition. This would result in an inefficient, non-continuous read. Cassandra is designed to protect you from writing queries (such as this), which may have unpredictable performance.

    Here are two alternatives:

    1- If you could restrict all of your key columns prior to travelduration (with an equals relation), then you could apply a your greater-than condition:

    select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
      and starttime='2015-12-1 00:00:00' and travelduration > 1800000;
    

    Of course, restricting on an exact starttime may not be terribly useful.

    2- Another approach would be to omit travelduration altogether, and then your original query would work.

    select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
      and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59';
    

    Unfortunately, Cassandra does not offer a large degree of query flexibility. Many people have found success using a solution like Spark (alongside Cassandra) to achieve this level of reporting.

    And just a side note, but don't use IN unless you have to. Querying with IN is similar to using a secondary index, in that Cassandra has to talk to several nodes to satisfy your query. Calling it with a single item probably isn't too big of a deal. But IN is one of those old RDBMS habits that you should really break before getting too deep into Cassandra.

    0 讨论(0)
提交回复
热议问题