Cassandra Error - Clustering column cannot be restricted (preceding column is restricted by a non-EQ relation)

前端未结

关注

 2  1656

情话喂你 2021-02-02 13:39

We are using Cassandra as the data historian for our fleet management solution. We have a table in Cassandra , which stores the details of journey made by the vehicle. The table

2条回答

孤独总比滥情好 (楼主)

2021-02-02 14:29
```
select * from journeydetails where bucketid in('2015-12') and vehicleid in('1234567')
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59' 
  and travelduration > 1800000;
```
That's not going to work. The reason goes back to how Cassandra stores data on-disk. The idea with Cassandra is that it is very efficient at returning a single row with a precise key, or at returning a continuous range of rows from the disk.

Your rows are partitioned by bucketid, and then sorted on disk by vehicleid, starttime, and travelduration. Because you are already executing a range query (non-EQ relation) on starttime, you cannot restrict the key that follows. This is because the travelduration restriction may disqualify some of the rows in your range condition. This would result in an inefficient, non-continuous read. Cassandra is designed to protect you from writing queries (such as this), which may have unpredictable performance.

Here are two alternatives:

1- If you could restrict all of your key columns prior to travelduration (with an equals relation), then you could apply a your greater-than condition:
```
select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime='2015-12-1 00:00:00' and travelduration > 1800000;
```
Of course, restricting on an exact starttime may not be terribly useful.

2- Another approach would be to omit travelduration altogether, and then your original query would work.
```
select * from journeydetails where bucketid='2015-12' and vehicleid='1234567'
  and starttime > '2015-12-1 00:00:00' and starttime < '2015-12-3 23:59:59';
```
Unfortunately, Cassandra does not offer a large degree of query flexibility. Many people have found success using a solution like Spark (alongside Cassandra) to achieve this level of reporting.

And just a side note, but don't use IN unless you have to. Querying with IN is similar to using a secondary index, in that Cassandra has to talk to several nodes to satisfy your query. Calling it with a single item probably isn't too big of a deal. But IN is one of those old RDBMS habits that you should really break before getting too deep into Cassandra.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...