问题
We have a networking problem in docker-swarm. The problem is below;
- we have virtualized environment over wmware ( vsphere 6.02)
- our servers are created from vmware say server1 and server2
- we have a docker compose file defining a couple of services
- we have an overlay-network definition within docker-compose for docker-swarm
- when we deploy system using docker-swarm deployment is finished successfully, all containers gets ip from overlay network range.
- But the problem is if 2 containers (say cnt1 and cnt2) are deployed to different servers they can not ping each other
- I check tcpdump and see that ARP communication is successfull so they know each other mac correctly
- But when you try to ping to container, ICMP Echo messages are send but are not delivered to second machine..
Where should I check, any advices?
server-1:~$ docker version
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:01:32 2017
OS/Arch: linux/amd64
Experimental: true
ps: I checked this post but I have latest version of docker / docker-swarm so the issue should be fixed..
ps-2: similar problem; https://github.com/docker/swarm/issues/2687
回答1:
Out of curiosity, in your VMware environment, do you have NSX deployed? I may have an answer, but it only applies if NSX is deployed in the environment.
ESXi will apparently drop OUTBOUND packets from VMs if the destination port is the same as the port configured for the VXLAN VTEP communication.
NSX utilizes port 4789/udp for VTEP communication for VXLAN (by default, as of 6.2.3; prior to that, it was 8472/udp). (If the VMs are on the same host, then traffic is not dropped, because, while it may be OUTBOUND traffic, it does not egress the host, and does not get to the same stage within the VMKernel to be dropped.)
The wording in KB2079386 is a little off. It states:
VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.
But, it should read:
VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.
If you are using NSX, you could try changing the port used for the VXLAN VTEPs, but port 4789/udp is required if you are going to leverage hardware VTEPs at all.
(I can't take full credit for this. I stumbled across this blog post talking about similar behavior when troubleshooting a similar issue.)
回答2:
The first thing I would check for overlay networking is your firewall rules. You need the following open between the hosts:
- The swarm port, usually 2377/tcp, this is most likely already done
- The overlay control port 7946/tcp and 7946/udp
- The overlay data port 4789/udp
- The IPSEC protocol 50 if your overlay networks are defined as "secure" (that's a protocol, not a port, so
iptables -A INPUT -p 50 -j ACCEPT
)
If that doesn't help, look into using netshoot to debug where the traffic is getting stopped.
回答3:
If your nodes are not on the same subnet (eg. they all have public IPs) - then make sure you use the --advertise-addr
option specifying the IP address that the other nodes can reach when that node (other managers AND workers) joins the swarm.
Otherwise the overlay network will not route correctly between hosts even though stack deployment & node registration etc appear to be working fine.
See the detailed explanation for my case in the same GitHub issue --> https://github.com/docker/swarm/issues/2687
回答4:
"VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application."
But we can change docker swarm data-path-port(the default port number 4789 is used) to another:
docker swarm init --data-path-port=7789
来源:https://stackoverflow.com/questions/43933143/docker-swarm-overlay-network-is-not-working-for-containers-in-different-hosts