We have a standalone zookeeper setup on a dev machine. It works fine for every other dev machine except this one testdev machine.
We get this error over and over aga
This can happen if there are too many open connections.
Try increasing the maxClientCnxns
setting.
From documentation:
maxClientCnxns (No Java system property)
Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address, may make to a single member of the ZooKeeper ensemble. This is used to prevent certain classes of DoS attacks, including file descriptor exhaustion. Setting this to 0 or omitting it entirely removes the limit on concurrent connections.
You can edit settings in the config file. Most likely it can be found at /etc/zookeeper/conf/zoo.cfg
.
In modern ZooKeeper versions default value is 60. You can increase it by adding the maxClientCnxns=4096
line to the end of the config file.
Had the same error during setup on a 2 node cluster. I discovered I had mixed up the contents of the myid file versus the server.id=HOST_IP:port entry.
Essentially, if you have two servers (SERVER1 and SERVER2) for which you have created "myid" files in dataDir for zookeeper as below
SERVER1 (myid)
1
SERVER2 (myid)
2
Ensure the entry in your zoo.cfg file corresponds for each of these i.e server.1 should use SERVER1 hostname and server.2 should use SERVER2 hostname followed by the port as below
SERVER1 (zoo.cfg)
... (other config omitted)
server.1=SERVER1:2888:3888
server.2=SERVER2:2888:3888
SERVER2 (zoo.cfg)
... (other config omitted)
server.1=SERVER1:2888:3888
server.2=SERVER2:2888:3888
Just to make sure, I also deleted the version-* folder in the dataDir then restarted Zookeeper to get it working.
leave only one entry for your host IP in /etc/hosts file, it resolved.
I just have the same situation as you and I have just fixed this problem.
my conf/zoo.cfg
just like this:
server.1=10.194.236.32:2888:3888
server.2=10.194.236.33:2888:3888
server.3=10.208.177.15:2888:3888
server.4=10.210.154.23:2888:3888
server.5=10.210.154.22:2888:3888
then i set data/myid
file content like this:
1 //at host 10.194.236.32
2 //at host 10.194.236.33
3 //at host 10.208.177.15
4 //at host 10.210.154.23
5 //at host 10.210.154.22
finally restart zookeeper
I also get the same error when i started my replicated zk, one of zkClient can not connect to localhost:2181, i checked the log file under apache-zookeeper-3.5.5-bin/logs directory, and found this:
2019-08-20 11:30:39,763 [myid:5] - WARN [QuorumPeermyid=5(secure=disabled):QuorumCnxManager@677] - Cannot open channel to 3 at election address /xxxx:3888 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:733) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:910) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1247) 2019-08-20 11:30:44,768 [myid:5] - WARN [QuorumPeermyid=5(secure=disabled):QuorumCnxManager@677] - Cannot open channel to 4 at election address /xxxxxx:3888 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:733) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:910) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1247) 2019-08-20 11:30:44,769 [myid:5] - INFO [QuorumPeermyid=5(secure=disabled):FastLeaderElection@919] - Notification time out: 51200
that means this zk server can not connect to other servers, and i found this server ping other servers fail, and after remove this server from the replica, the problem is solved.
hope this will be helpful.
This can happen despite the ZooKeeper servers being up and running and the socket open and accepting connections, if one or more of the ZooKeeper disks are out of space. This can easily happen if the old ZK snapshot and log files are never cleaned up:
The ZooKeeper server creates snapshot and log files, but never deletes them. The retention policy of the data and log files is implemented outside of the ZooKeeper server. The server itself only needs the latest complete fuzzy snapshot, all log files following it, and the last log file preceding it. The latter requirement is necessary to include updates which happened after this snapshot was started but went into the existing log file at that time. This is possible because snapshotting and rolling over of logs proceed somewhat independently in ZooKeeper. See the maintenance section in this document for more details on setting a retention policy and maintenance of ZooKeeper storage.
There is a maintenance job that can be run to clean up old snapshot and log files: See https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_maintenance.