问题
Description
I have an AWS ec2 instance (ubuntu 16) that runs a Python application. In which I call some Facebook Account Kit APIs and also Google Play Store APIs. They all work perfectly fine until I reboot the instance two weeks before.
After the reboot, the requests take more than 5 mins to finish, which is totally not acceptable. I have to manually set the timeout to over 10mins in order to let the request to be finished.
The problem only occurs on one of my servers, I run with the same environment on another server, it works perfectly fine with the response time of less than a second.
Temporarily Fix
To temporarily fix the issue, I then use a proxy server to finish the request.
- API Server (server with timeout problem)
- proxy server runs a python script and returns a result
- API Server (server with timeout problem) return response to client
Situations
- I tried using curl on the API server, it also has the response time of less than 1 second.
- I tried on the python environment using
requests
, and the response time is terrible, above 5 mins.- If I set the header
{'Connection' : 'keep-alive' }
, the second request onwards would be normal. - I turned on the logging, and it seems the request stuck on establishing connection to the destination.
- If I set the header
- I tried with the API that I wrote, and the response time is also terrible, above 5 mins.
Current Code
Request with the slow response time.
url_get_access_token = "https://graph.accountkit.com/v1.3/access_token?grant_type=authorization_code&code=%s&access_token=AA|%s|%s"
url_get_access_token = url_get_access_token % (token, self.facebook_app_id, self.facebook_account_kit_scert)
response = requests.get(url_get_access_token)
body = response.json()
The proxy server I mentioned above is another instance in the same subnet, but I call with a DNS server.
response = requests.get("https://proxyserver.com/somepath", params={})
As it only affects one of the servers, would that be a DNS problem or AWS config? Please help, thank you.
Update
Result of timed curls, it seems that call with iPv6 takes a lot more time.
$ time curl -4 -s https://graph.accountkit.com/v1.3
$ time curl -6 -s https://graph.accountkit.com/v1.3
iPv4
real 0m0.665s
user 0m0.068s
sys 0m0.020s
iPv6
real 2m7.180s
user 0m0.008s
sys 0m0.000s
回答1:
Two items come to mind.
DNS
Debug with:
$ cat /etc/resolv.conf
$ time dig aaaa graph.accountkit.com
It is possible that you have multiple nameservers listed, and not all of them are responsive, so you suffer from long lookups as it times out on the dead one(s).
TCP
Debug with:
$ time curl -4 -s https://graph.accountkit.com/v1.3
$ time curl -6 -s https://graph.accountkit.com/v1.3
It will say "Invalid OAuth 2.0 Access Token", yeah, yeah, that's fine. What we're interested in is how long it takes to connect, send the GET, and retrieve the web document.
This domain offers both an A and an AAAA address.
If IPv6 transport is toast it could take a while
for requests.get()
to failover to IPv4.
EDIT
Someone broke your IPv6 transport.
That's not acceptable in the modern internet.
Dropped packet timeouts likely led to the 127-second elapsed time.
Tools like traceroute6
and ping6
can
help you or a network professional diagnose
where the lossage is.
Possibly an ACL is too tight,
is discarding IPv6 packets that it shouldn't.
Discarding ICMPs would be especially bad.
For TCP to work correctly, ICMPs must be delivered.
A tcpdump
(or Wireshark) packet trace would help
identify exactly what went south.
It is possible you are suffering from PMTUD black-holing.
See if this displays any "packet too big" ICMP reports:
$ sudo tcpdump -tvvvni eth0 icmp6 and ip6[40+0]==2
Just looking at the timing of outbound port 443 TCP retrans would shed a lot of light on why things fail for two minutes and then the bits suddenly start flowing.
来源:https://stackoverflow.com/questions/56920375/extremely-long-response-time-using-requests