问题
I feel very difficult to understand the concept of wide row and related concepts from Cassandra The Definite Guide:
Cassandra uses a special primary key called a composite key (or compound key) to represent wide rows, also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns. The partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns. The clustering columns are used to control how data is sorted for storage within a partition. Cassandra also supports an additional construct called a static column, which is for storing data that is not part of the primary key but is shared by every row in a partition.
Figure 4-5 shows how each partition is uniquely identified by a partition key, and how the clustering keys are used to uniquely identify the rows within a partition.
Are a wide row and a partition synonyms?
In "the partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns" and "each partition is uniquely identified by a partition key",
since a partition key is for a wide row, why are there multiple "rows" (does "rows" here mean "wide rows")?
how does the partition key "determine the nodes on which rows are stored"?
How can a partition key be used for "each partition is uniquely identified by a partition key"?
In "the clustering columns are used to control how data is sorted for storage within a partition",
- what is a clustering column, for example, what are the clustering columns in the figure?
- How do the clustering columns "control how data is sorted for storage within a partition"?
In "the clustering keys are used to uniquely identify the rows within a partition",
- a partition is a synonym of a wide row, what does it mean by "the rows within a partition"?
- How "the clustering keys are used to uniquely identify the rows within a partition"?
Thanks.
回答1:
Are a wide row and a partition synonyms?
partition and row can be considered synonym. wide row is a scenario where the chosen partition key will result in very large number of cells
for that key. Consider a scenario which has all persons in a country and partition key used is city, then there will be one row for one city and all person will be cells
in that row. For metro city this will lead to wide rows. Another example can be storing sensor data received every few seconds with sensorId as partition key, which will lead to huge number of cells
some years down the line.
since a partition key is for a wide row, why are there multiple "rows" (does "rows" here mean "wide rows")?
Same as above.
how does the partition key "determine the nodes on which rows are stored"?
From partiton key hash (MurMur3Hash is default) is generated and each node in cassandra is responsible for range of values. Consider Hash of partition key value turns out to be 20 and Node1 is responsible for range 1 to 100 then that partiton will reside on Node1.
How can a partition key be used for "each partition is uniquely identified by a partition key"?
As explained above partition key decides on which node the data resides.. Data representation can be considered as huge map which can have only unique keys.
what is a clustering column, for example, what are the clustering columns in the figure?
Consider a table created like Create TABLE test (a text,b int, c text, PRIMARY KEY(a,b))
here a
is partition key and b
is clustering column. In the figure attached clustering key
is the clustering column and whole enclosing box is cell.
How do the clustering columns "control how data is sorted for storage within a partition"?
Cassandra will sort the data using column b
in the above example table in ascending table. It can be changed to descending as well.
INSERT INTO test(a,b,c) VALUES('test',2,'test2')
INSERT INTO test(a,b,c) VALUES('test',1,'test1')
INSERT INTO test(a,b,c) VALUES('test-new',1,'test1')
If you run the above query in this order cassandra will store data in following order (Data representation has much more than below.. just check the order of column b):
test -> [b:1,c=test1] [b:2,c=test2]
test-new -> [b:1,c=test1]
a partition is a synonym of a wide row, what does it mean by "the rows within a partition"?
Clustering column is used to identify cells
(cells is a better term than row) within a partition. example SELECT * from test where a='test' and b=1
will pick up the cell with b:1
for partiton key test.
How "the clustering keys are used to uniquely identify the rows within a partition"?
Above answer should explain this as well.
来源:https://stackoverflow.com/questions/59080866/how-to-understand-the-concept-of-wide-row-and-related-concepts-in-cassandra