问题

Description

I have an AWS ec2 instance (ubuntu 16) that runs a Python application. In which I call some Facebook Account Kit APIs and also Google Play Store APIs. They all work perfectly fine until I reboot the instance two weeks before.

After the reboot, the requests take more than 5 mins to finish, which is totally not acceptable. I have to manually set the timeout to over 10mins in order to let the request to be finished.

The problem only occurs on one of my servers, I run with the same environment on another server, it works perfectly fine with the response time of less than a second.

Temporarily Fix

To temporarily fix the issue, I then use a proxy server to finish the request.

API Server (server with timeout problem)
proxy server runs a python script and returns a result
API Server (server with timeout problem) return response to client

Situations

I tried using curl on the API server, it also has the response time of less than 1 second.
I tried on the python environment using requests, and the response time is terrible, above 5 mins.
1. If I set the header {'Connection' : 'keep-alive' }, the second request onwards would be normal.
2. I turned on the logging, and it seems the request stuck on establishing connection to the destination.
I tried with the API that I wrote, and the response time is also terrible, above 5 mins.

Current Code

Request with the slow response time.

url_get_access_token = "https://graph.accountkit.com/v1.3/access_token?grant_type=authorization_code&code=%s&access_token=AA|%s|%s"
url_get_access_token = url_get_access_token % (token, self.facebook_app_id, self.facebook_account_kit_scert)
response = requests.get(url_get_access_token)
body = response.json()

The proxy server I mentioned above is another instance in the same subnet, but I call with a DNS server.

response = requests.get("https://proxyserver.com/somepath", params={})

As it only affects one of the servers, would that be a DNS problem or AWS config? Please help, thank you.

Update

Result of timed curls, it seems that call with iPv6 takes a lot more time.

$ time curl -4 -s https://graph.accountkit.com/v1.3
$ time curl -6 -s https://graph.accountkit.com/v1.3

iPv4

real    0m0.665s
user    0m0.068s
sys 0m0.020s

iPv6

real    2m7.180s
user    0m0.008s
sys 0m0.000s

回答1:

Two items come to mind.

DNS

Debug with:

$ cat /etc/resolv.conf

$ time dig aaaa graph.accountkit.com

It is possible that you have multiple nameservers listed, and not all of them are responsive, so you suffer from long lookups as it times out on the dead one(s).

TCP

Debug with:

$ time curl -4 -s https://graph.accountkit.com/v1.3
$ time curl -6 -s https://graph.accountkit.com/v1.3

It will say "Invalid OAuth 2.0 Access Token", yeah, yeah, that's fine. What we're interested in is how long it takes to connect, send the GET, and retrieve the web document.

This domain offers both an A and an AAAA address. If IPv6 transport is toast it could take a while for requests.get() to failover to IPv4.

EDIT

Someone broke your IPv6 transport. That's not acceptable in the modern internet. Dropped packet timeouts likely led to the 127-second elapsed time. Tools like traceroute6 and ping6 can help you or a network professional diagnose where the lossage is. Possibly an ACL is too tight, is discarding IPv6 packets that it shouldn't. Discarding ICMPs would be especially bad. For TCP to work correctly, ICMPs must be delivered.

A tcpdump (or Wireshark) packet trace would help identify exactly what went south. It is possible you are suffering from PMTUD black-holing. See if this displays any "packet too big" ICMP reports:

$ sudo tcpdump -tvvvni eth0 icmp6 and ip6[40+0]==2

Just looking at the timing of outbound port 443 TCP retrans would shed a lot of light on why things fail for two minutes and then the bits suddenly start flowing.

来源：https://stackoverflow.com/questions/56920375/extremely-long-response-time-using-requests

标签

python