Cassandra cluster - data density (data size per node) - looking for feedback and advises

前端 未结 3 659
攒了一身酷
攒了一身酷 2021-02-08 03:57

I am considering the design of a Cassandra cluster.

The use case would be storing large rows of tiny samples for time series data (using KairosDB), data will be almost i

3条回答
  •  猫巷女王i
    2021-02-08 04:34

    I would recommend to think about the data model of your application and how to partition your data. For time series data it would probably make sense to use a composite key [1] which consists of a partition key + one or more columns. Partitions are distributed across multiple servers according to the hash of the partition key (depending on the Cassandra Partitioner that you use, see cassandra.yaml).

    For example, you could partition your server by device that generates the data (Pattern 1 in [2]) or by a period of time (e.g., per day) as shown in Pattern 2 in [2].

    You should also be aware that the max number of values per partition is limited to 2 billion [3]. So, partitioning is highly recommended. Don't store your entire time series on a single Cassandra node in a single partition.

    [1] http://www.planetcassandra.org/blog/composite-keys-in-apache-cassandra/

    [2] https://academy.datastax.com/demos/getting-started-time-series-data-modeling

    [3] http://wiki.apache.org/cassandra/CassandraLimitations

提交回复
热议问题