Suppose,there is a table with the following structure:
create table cities (
root text,
name text,
primary key(root,name)
) with clustering order by (n
Definining such a table the partition key is root
while name
is a clustering key.
As the name suggest, partition is responsible for partitioning -- how partitioning work?
Let's say you have 4 nodes cluster -- and we have an hash function that generates only 8 keys, (A,B,C,D,E,F,G,H) -- here is how hashes are distributed in the cluster
node 1 - (A,B)
node 2 - (C,D)
node 3 - (E,F)
node 4 - (G,H)
each node will use as replica's the following 2, so replica for node 1 are (2,3), replica for node 2 are (3,4), replica for node 3 are (4,1) and finally replica for node 4 are (1,2).
Let's say our function hash(root)
, when root value is .
returns B
that belongs to node 1 -- node 1 will store the information and nodes (2,3) will store the replica. Node 4 is NEVER involved into cities
table, it will not contain any data concerning this table (exception made for hints situations which are not part of the concept) because of the fix partition key. In this example you use about 75% of your cluster which may look like an acceptable situation ... let's say in one moment your application suffers because the 3 nodes involved are not capable of handling read/write requests. Now you can add as many nodes as you want to the cluster but using this data model you won't be able to scale horizontally, because NO OTHER NODE WILL EVER BE INVOLVED INTO cities TABLE. The only way I see to solve your problem in such a situation is to increment power of these 3 nodes (vertical scaling) by adding more memory, more powerful cpu and I/O. Creating a schema that does not allow horizontal scaling is an anti pattern