Cassandra nodetool repair is getting stuck sometimes

那年仲夏 提交于 2020-08-10 18:48:43

问题


I am running nodetool repair -pr -full my_ks my_tbl in our Cassandra cluster(has two DCs). It sometimes hangs with below debug logs. It works after restarting the Cassandra process. Any hints on the root cause of this issue?

DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2571566434 for /10.22.38.223
DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2495429260 for /10.22.38.26
DEBUG [GossipStage:1] 2020-07-13 14:04:22,818 FailureDetector.java:456 - Ignoring interval time of 2571592685 for /10.32.146.85
INFO  [Thread-181] 2020-07-13 14:04:22,900 RepairRunnable.java:125 - Starting repair command #2, repairing keyspace my_ks with repair options (parallelism: parallel, primary range: true, incremental: false, job threads: 1, ColumnFamilies: [my_tbl], dataCenters: [], hosts: [], # of ranges: 256)
INFO  [HANDSHAKE-/10.32.146.85] 2020-07-13 14:04:23,460 OutboundTcpConnection.java:515 - Handshaking version with /10.32.146.85
DEBUG [GossipStage:1] 2020-07-13 14:04:23,716 FailureDetector.java:456 - Ignoring interval time of 2000838464 for /10.22.38.27
DEBUG [GossipStage:1] 2020-07-13 14:04:23,716 FailureDetector.java:456 - Ignoring interval time of 2000923736 for /10.22.38.68
DEBUG [GossipStage:1] 2020-07-13 14:04:23,815 FailureDetector.java:456 - Ignoring interval time of 2100571952 for /10.32.253.232
DEBUG [GossipStage:1] 2020-07-13 14:04:25,247 FailureDetector.java:456 - Ignoring interval time of 2429005356 for /10.32.144.198

I am using Cassandra 3.9.

Edit: I see below logs with trace enabled:

INFO  [HANDSHAKE-/10.32.168.76] 2020-07-18 02:12:41,253 OutboundTcpConnection.java:515 - Handshaking version with /10.32.168.76
INFO  [HANDSHAKE-/10.32.142.195] 2020-07-18 02:12:42,260 OutboundTcpConnection.java:515 - Handshaking version with /10.32.142.195
INFO  [HANDSHAKE-/10.32.144.198] 2020-07-18 02:12:42,260 OutboundTcpConnection.java:515 - Handshaking version with /10.32.144.198
ERROR [RepairTracePolling] 2020-07-18 02:12:45,836 CassandraDaemon.java:226 - Exception in thread Thread[RepairTracePolling,5,RMI Runtime]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1718) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1667) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1608) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1527) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:975) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:271) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:232) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.repair.RepairRunnable$4.runMayThrow(RepairRunnable.java:412) ~[apache-cassandra-3.9.jar:3.9]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.9.jar:3.9]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_72]

来源:https://stackoverflow.com/questions/62886719/cassandra-nodetool-repair-is-getting-stuck-sometimes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!