Cassandra bucket splitting for partition sizing

前端 未结 1 675
北海茫月
北海茫月 2021-01-15 11:04

I am quite new to Cassandra, I just learned it with Datastax courses, but I don\'t find enough information on bucket here or on the Internet and in my application I need to

1条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-15 11:39

    You should focus on your requirements, and then go back to your schema model. In your case, how many measures per day each instruments can do? If each one can do less than your 400k measures then you're already done without bucketing. If your instruments can perform up to 10M measures each, then N=10M/400k buckets should be enough to satisfy your requirements. Assuming N buckets, when you need to query all the measures coming from a particular instrument you have to perform N queries, one for each bucket, unless you can count the measures during your writes, so that you can change bucket when a bucket is full. I mean, you write the first 400k measures in the bucket 0, then you write the second 400k measures to the bucket 1, and so on. Then you need to keep track of on how many K buckets you inserted data and perform only K queries instead on N. That way you have unbalanced buckets (and partitions), but you get your results in the smallest number of queries. If you prefer a balanced-bucket approach instead, you can perform each write in a uniformly distributed random bucket number, but then you have to perform all of your N queries to get all the data of a specific instrument.

    0 讨论(0)
提交回复
热议问题