Cannot ping docker container on another host using an overlay network

心已入冬 提交于 2020-12-08 03:06:49

问题


This question has been asked many many times before on all types of fora but unfortunately, none of the answers have helped me so far.

I will get right to it.

 OS: RHEL 7.7 Maipo
 Docker Version: Engline/Client 18.09.7

My configuration:

Host 1: IP 169.192.215.74
Host 2: IP 10.210.87.16

**On Host1:**

# netstat -plntu|grep -E "4789|7946|2377"
tcp6       0      0 :::2377                 :::*                    LISTEN      3576/dockerd        
tcp6       0      0 :::7946                 :::*                    LISTEN      3576/dockerd        
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -                   
udp6       0      0 :::7946                 :::*                                3576/dockerd

# docker swarm init --advertise-addr=169.192.215.74

# docker network create --attachable --driver overlay syndichain

# docker swarm join-token manager


docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377

Ping Host 2:

# ping 10.210.87.16
PING 10.210.87.16 (10.210.87.16) 56(84) bytes of data.
64 bytes from 10.210.87.16: icmp_seq=1 ttl=119 time=0.804 ms
64 bytes from 10.210.87.16: icmp_seq=2 ttl=119 time=1.49 ms
64 bytes from 10.210.87.16: icmp_seq=3 ttl=119 time=0.646 ms
64 bytes from 10.210.87.16: icmp_seq=4 ttl=119 time=0.664 ms
^C
--- 10.210.87.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.646/0.901/1.493/0.348 ms


# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8d:a0:60 brd ff:ff:ff:ff:ff:ff
    inet 169.192.215.74/24 brd 169.192.215.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:a9:99:bf:c2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
76: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ab:17:1e:2f brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 brd 172.19.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
468: veth49ae88e@if467: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 82:e1:f6:1e:72:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1


Bring up two containers (RHEL 7 container from our local repository)

# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.da85 957c8834c8a9 bash

# docker run --rm -it --network="syndichain" -p 11003:11003 --name rhel.da85.2 957c8834c8a9 bash

On Host 2:

    # docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377 --advertise-addr=10.210.87.216

    # docker node ls
    ID                            HOSTNAME                      STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
    yxphc861xra6ajc5llxrpp9ia *   sd-cfd1-0319   Ready               Active              Reachable           18.09.7
    2c45co3pj38u2gwqrvuawwi11     sd-d5d8-da85   Ready               Active              Leader              18.09.7

# netstat -plntu|grep -E "4789|7946|2377"
tcp6       0      0 :::2377                 :::*                    LISTEN      8562/dockerd        
tcp6       0      0 :::7946                 :::*                    LISTEN      8562/dockerd        
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -                   
udp6       0      0 :::7946                 :::*                                8562/dockerd   

Ping Host 1:

# ping 169.192.215.74
PING 169.192.215.74 (169.192.215.74) 56(84) bytes of data.
64 bytes from 169.192.215.74: icmp_seq=1 ttl=55 time=0.796 ms
64 bytes from 169.192.215.74: icmp_seq=2 ttl=55 time=0.530 ms
64 bytes from 169.192.215.74: icmp_seq=3 ttl=55 time=0.513 ms
^C
--- 169.192.215.74 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.513/0.613/0.796/0.129 ms


# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:98:63:cb brd ff:ff:ff:ff:ff:ff
    inet 10.210.87.216/23 brd 10.210.87.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:20:4a:de:3f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
871: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:f0:d9:46:4c brd ff:ff:ff:ff:ff:ff
    inet 172.23.0.1/16 brd 172.23.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
885: veth4d635e9@if884: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 86:4f:5b:81:89:5c brd ff:ff:ff:ff:ff:ff link-netnsid 1
892: veth2734e0c@if891: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether b2:6c:1d:94:99:3f brd ff:ff:ff:ff:ff:ff link-netnsid 4


Bring up two RHEL7 containers on Host 2:

# docker run --rm -it --network="syndichain" --name rhel.0319 -p 11001:11001 957c8834c8a9 bash

# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.0319.2 957c8834c8a9 bash

Now, we run some commands to check our overlay network:

On Host 1:

# docker network inspect syndichain
[
    {
        "Name": "syndichain",
        "Id": "8ar2ar0z0brl5lp1lk13xo4ik",
        "Created": "2020-02-28T11:44:01.411177608-05:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "fb911a7538d05f116d7f3ed50192e198d93faadaf1fb98f55fa79eb4cfbea72d": {
                "Name": "rhel.da85",
                "EndpointID": "8caf50c904f660c750bfa3608902d78239211e7076afed68d77a46e3e141386d",
                "MacAddress": "02:42:0a:00:00:02",
                "IPv4Address": "10.0.0.2/24",
                "IPv6Address": ""
            },
            "lb-syndichain": {
                "Name": "syndichain-endpoint",
                "EndpointID": "0a122599aee9d9b84dde6383aeb432580769af886278b46310a1c65bc68a0bc9",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "7f037ddf9a5f",
                "IP": "10.210.87.216"
            },
            {
                "Name": "b1b424f73d55",
                "IP": "169.192.215.74"
            }
        ]
    }
]

On Host 2:

# docker network inspect syndichain
[
    {
        "Name": "syndichain",
        "Id": "8ar2ar0z0brl5lp1lk13xo4ik",
        "Created": "2020-02-28T11:42:08.565999966-05:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "351d29af85e6e8addafcf69b3c464b49f8c9720cbaf8911328ab1aadce13e170": {
                "Name": "rhel.0319.2",
                "EndpointID": "fe0ed40c7a0b8687ca4a34d7b0196e382c32f3a5f82c43233174da349a8a5b34",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            },
            "62493b5072d9c2e516a0b8bfb9d0ba6dea9c2cbca2787bf4f8c2bf5ffe021dd0": {
                "Name": "rhel.0319",
                "EndpointID": "9a5e8977ca434854da886c215f14583cc5a7c5bb8811d5408a5cae9c4a325c47",
                "MacAddress": "02:42:0a:00:00:0e",
                "IPv4Address": "10.0.0.14/24",
                "IPv6Address": ""
            },
            "lb-syndichain": {
                "Name": "syndichain-endpoint",
                "EndpointID": "37415feb73f86cc3ceb1ab4d060da076ed4fca1f0b3ebe440993c82c14258a2d",
                "MacAddress": "02:42:0a:00:00:0f",
                "IPv4Address": "10.0.0.15/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "7f037ddf9a5f",
                "IP": "10.210.87.216"
            },
            {
                "Name": "b1b424f73d55",
                "IP": "169.192.215.74"
            }
        ]
    }
]

Now, I will demonstrate the problem:

Ping rhel.da85.2 container from rhel.da85 container (both containers on the same host)

[root@fb911a7538d0 /]# ping rhel.da85.2
PING rhel.da85.2 (10.0.0.5) 56(84) bytes of data.
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=3 ttl=64 time=0.048 ms
^C
--- rhel.da85.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.048/0.067/0.080/0.015 ms

Ping rhel.0319 container from rhel.da85 container (containers on different hosts)

[root@fb911a7538d0 /]# ping rhel.0319
PING rhel.0319 (10.0.0.14) 56(84) bytes of data.

It is stuck at this. Nothing happens.


Ping rhel.0319.2 container from rhel.0319 container (both containers on the same host)

[root@62493b5072d9 /]# ping rhel.0319.2
PING rhel.0319.2 (10.0.0.4) 56(84) bytes of data.
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=3 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=5 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=6 ttl=64 time=0.063 ms
^C
--- rhel.0319.2 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.061/0.071/0.109/0.018 ms


Ping rhel.da85 container from rhel.0319 container (both containers on different hosts)

[root@62493b5072d9 /]# ping rhel.da85  
PING rhel.da85 (10.0.0.2) 56(84) bytes of data.


It is stuck at this point, just like it is stuck pinging a container on 0319 from da85.

I have even explicitly enabled ip4 forwarding as per https://stackoverflow.com/a/41453306/10382340

However, still stuck with this issue for the past week. Appreciate any help.

Update:

Someone suggested running tcpdump and nsenter. Here are the logs from the tcpdump running at the container level (on the same VM where the PING request is being generated) and nsenter logs for "ARP". You can see that the ARP request never gets a response in tcpdump but nsenter shows request/reply pairs for ARP.

[root@0ae70bfa66ae /]# tcpdump -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:26:03.276397 IP (tos 0x0, ttl 64, id 34495, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 1, length 64
16:26:04.276569 IP (tos 0x0, ttl 64, id 34971, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 2, length 64
16:26:05.276526 IP (tos 0x0, ttl 64, id 35636, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 3, length 64
16:26:06.276545 IP (tos 0x0, ttl 64, id 35935, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 4, length 64
16:26:07.276542 IP (tos 0x0, ttl 64, id 36706, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 5, length 64
16:26:08.276543 IP (tos 0x0, ttl 64, id 37284, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 6, length 64
16:26:08.294475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has rhel.da85.syndichain tell rhel.0319.syndichain, length 28




# nsenter --net=/var/run/docker/netns/1-khyguu9i6n tcpdump -peni any "arp"

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:45:23.302460   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302471 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302485  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:23.302488 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294458   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294474 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294477  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294479 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302460   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302480 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302483  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302485 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28

Update 2:

From container on Source Host

[root@2baddaa98452 /]# nc -zvu rhel.da85 4789
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.0.2:4789.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.

Source Host TCP Dump

# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
11:28:11.706587 IP sd-cfd1-0319.59121 > sd-d5d8-da85.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.11.44806 > 10.0.0.2.4789:  [|VXLAN]

Target Host TCP Dump

# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes

回答1:


From your comment, it appears that the overlay networking ports are blocked between nodes. This could be from iptables or another firewall tool on the host, a network firewall between the nodes, or other software like VM tooling or a cloud router ACL, blocking those ports. The ports that need to be opened are:

  • TCP and UDP port 7946 for communication among nodes
  • UDP port 4789 for overlay network traffic

Ref: https://docs.docker.com/network/overlay/



来源:https://stackoverflow.com/questions/60456280/cannot-ping-docker-container-on-another-host-using-an-overlay-network

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!