I was testing different Python HTTP libraries today and I realized that http.client library seems to perform much much faster than requests.
To test it you can run foll
Based on profiling both, the main difference appears to be that the requests
version is doing a DNS lookup for every request, while the http.client
version is doing so once.
# http.client
ncalls tottime percall cumtime percall filename:lineno(function)
1974 0.541 0.000 0.541 0.000 {method 'recv_into' of '_socket.socket' objects}
1000 0.020 0.000 0.045 0.000 feedparser.py:470(_parse_headers)
13000 0.015 0.000 0.563 0.000 {method 'readline' of '_io.BufferedReader' objects}
...
# requests
ncalls tottime percall cumtime percall filename:lineno(function)
1481 0.827 0.001 0.827 0.001 {method 'recv_into' of '_socket.socket' objects}
1000 0.377 0.000 0.382 0.000 {built-in method _socket.gethostbyname}
1000 0.123 0.000 0.123 0.000 {built-in method _scproxy._get_proxy_settings}
1000 0.111 0.000 0.111 0.000 {built-in method _scproxy._get_proxies}
92000 0.068 0.000 0.284 0.000 _collections_abc.py:675(__iter__)
...
You're providing the hostname to http.client.HTTPConnection()
once, so it makes sense it would call gethostbyname
once. requests.Session
probably could cache hostname lookups, but it apparently does not.
EDIT: After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname
regardless of the actual request itself.