Cassandra node is taking hours to join

问题

My cluster of size 2 had entered into somewhat inconsistent state. On one node (call it node A) nodetool status was correctly showing 2 nodes. While on another node (call it B) it was showing only one i.e. itself. After several attempts I could not fixed the issue. So I decommissioned node B. But nodetool status on node A was still showing the node B that to in UN state. I had to restart cassandra on node A so that it forget node B.

But this has lead to another problem. I am making new node (call it C) to join the cluster of node A. But that node is taking hours. It's already six hours and I am wondering whether it will successfully join finally.

Looking at debug logs of node C suggest that node B (the decommissioned one) is causing trouble. Logs at node C are constantly showing:

DEBUG [GossipTasks:1] 2017-04-29 12:38:40,004 Gossiper.java:337 - Convicting /10.120.8.53 with status removed - alive false

Nodetool status on node A is showing the node C in joining state as expected.

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UJ  10.120.8.113  1006.97 MiB  256          ?       f357d8d0-2379-43d8-8ae5-62224191fb6c  rack1
UN  10.120.8.23   5.29 GiB   256          ?       596260a0-785a-435c-a3f3-632f56c5c882  rack1

Load for node C increases in fraction after couple of hours.

I checked whether system.peers contains node B. But the table contains zero rows.

I am using cassandra 3.7.

What's going wrong. What can I do to avoid losing data on node A and still scale the cluster?

回答1:

Run nodetool netstats on node C and see if there's is a progress going on. Also review nodetool compactionstats, see amount of compactions pending, and see if it goes down with time.

If the bootstraping failed, try restarting the node.

As an alternative, you can remove node C and add it once again, with auto_bootstrap setting set to false. After the node is up, run nodetool rebuild, and nodetool repair after the process - should be a faster alternative for regular bootstrap.

来源：https://stackoverflow.com/questions/43696127/cassandra-node-is-taking-hours-to-join

标签

cassandra

cassandra-3.0