Why is Python 3 http.client so much faster than python-requests?

前端 未结 2 794
萌比男神i
萌比男神i 2021-02-07 13:14

I was testing different Python HTTP libraries today and I realized that http.client library seems to perform much much faster than requests.

To test it you can run foll

2条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-07 14:04

    Based on profiling both, the main difference appears to be that the requests version is doing a DNS lookup for every request, while the http.client version is doing so once.

    # http.client
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1974    0.541    0.000    0.541    0.000 {method 'recv_into' of '_socket.socket' objects}
         1000    0.020    0.000    0.045    0.000 feedparser.py:470(_parse_headers)
        13000    0.015    0.000    0.563    0.000 {method 'readline' of '_io.BufferedReader' objects}
    ...
    
    # requests
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1481    0.827    0.001    0.827    0.001 {method 'recv_into' of '_socket.socket' objects}
         1000    0.377    0.000    0.382    0.000 {built-in method _socket.gethostbyname}
         1000    0.123    0.000    0.123    0.000 {built-in method _scproxy._get_proxy_settings}
         1000    0.111    0.000    0.111    0.000 {built-in method _scproxy._get_proxies}
        92000    0.068    0.000    0.284    0.000 _collections_abc.py:675(__iter__)
    ...
    

    You're providing the hostname to http.client.HTTPConnection() once, so it makes sense it would call gethostbyname once. requests.Session probably could cache hostname lookups, but it apparently does not.

    EDIT: After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname regardless of the actual request itself.

提交回复
热议问题