python-requests: fetching the head of the response content without consuming it all

前端未结

关注

 2  1977

Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg.

相关标签:

2条回答

野趣味

2020-12-30 17:11
if 'content-type' suffices, you can issue an HTTP 'Head' request instead of 'Get', to just receive the HTTP headers.
```
import requests

url = 'http://www.december.com/html/demo/hello.html'
response = requests.head(url)
print response.headers['content-type']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

囚心锁ツ

2020-12-30 17:21

Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False. That option has since been renamed to stream and the boolean value is inverted, so you want stream=True.

The original answer follows.

Once you use iter_content(), you have to continue using it; .text indirectly uses the same interface under the hood (via .content).

In other words, by using iter_content() at all, you have to do the work .text does by hand:

from requests.compat import chardet

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + b''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = str(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = str(contents, errors='replace')
    print(textcontent)

presuming you use Python 3.

The alternative is to make 2 requests:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
     print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

Python 2 version:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + ''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = unicode(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = unicode(contents, errors='replace')
    print(textcontent)

0 讨论(0)