Cassandra node - rebuild v.s. repair

前端 未结 1 1167
说谎
说谎 2021-02-05 08:27

What is the difference between:

a) nodetool rebuild

b) nodetool repair [-pr]

In other words, what exactly do the respective co

相关标签:
1条回答
  • 2021-02-05 08:43

    nodetool rebuild: is similar to the bootstrapping process (when you add a new node to the cluster) but for a datacenter. The process here is mainly a streaming from the already live nodes to the new nodes (the new ones are empty). So after defining the key ranges for the nodes which is very fast, the rest can be seen as a copy operation.

    nodetool repair -pr: is not a copy operation, the node being repaired is not empty, it already contains data but if the replication factor is greater than 1 that data needs to be compared to the data on the rest of the replicas and if there is a difference it will be corrected. The process involves a lot of streaming but it is not data streaming: the node being repaired requests a merkle tree (basically a tree of hashes) in order to verify if the information both nodes have is the same or not, if not it requests a full stream of the section of the data that has any difference (so all the replicas have the same data). Streaming this hashes if faster than streaming the whole data before verification, this works under the assumption that most data will be the same on both nodes except for some differences here and there. This process also removes tombstones created when deleting from the database, defining like a new "checkpoint" after which new tombstones will be created upon deletion of data, but the old ones will not be used anymore.

    Hope it helps!

    0 讨论(0)
提交回复
热议问题