I was testing different Python HTTP libraries today and I realized that http.client library seems to perform much much faster than requests.
To test it you can run foll
copy-pasting response from @Lukasa posted here:
The reason Requests is slower is because it does substantially more than httplib. httplib can be thought of as the bottom layer of the stack: it does the low-level wrangling of sockets. Requests is two layers further up, and adds things like cookies, connection pooling, additional settings, and kinds of other fun things. This is necessarily going to slow things down. We simply have to compute a lot more than httplib does.
You can see this by looking at cProfile results for Requests: there's just way more result than there is for httplib. This is always to be expected with high-level libraries: they add more overhead because they have to do a lot more work.
While we can look at targetted performance improvements, the sheer height of the call stack in all cases is going to hurt our performance markedly. That means that the complaint that "requests is slower than httplib" is always going to be true: it's like complaining that "requests is slower than sending carefully crafted raw bytes down sockets." That's true, and it'll always be true: there's nothing we can do about that.
Based on profiling both, the main difference appears to be that the requests
version is doing a DNS lookup for every request, while the http.client
version is doing so once.
# http.client
ncalls tottime percall cumtime percall filename:lineno(function)
1974 0.541 0.000 0.541 0.000 {method 'recv_into' of '_socket.socket' objects}
1000 0.020 0.000 0.045 0.000 feedparser.py:470(_parse_headers)
13000 0.015 0.000 0.563 0.000 {method 'readline' of '_io.BufferedReader' objects}
...
# requests
ncalls tottime percall cumtime percall filename:lineno(function)
1481 0.827 0.001 0.827 0.001 {method 'recv_into' of '_socket.socket' objects}
1000 0.377 0.000 0.382 0.000 {built-in method _socket.gethostbyname}
1000 0.123 0.000 0.123 0.000 {built-in method _scproxy._get_proxy_settings}
1000 0.111 0.000 0.111 0.000 {built-in method _scproxy._get_proxies}
92000 0.068 0.000 0.284 0.000 _collections_abc.py:675(__iter__)
...
You're providing the hostname to http.client.HTTPConnection()
once, so it makes sense it would call gethostbyname
once. requests.Session
probably could cache hostname lookups, but it apparently does not.
EDIT: After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname
regardless of the actual request itself.