python-requests: fetching the head of the response content without consuming it all

前端 未结 2 1974
南旧
南旧 2020-12-30 16:27

Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg.

相关标签:
2条回答
  • 2020-12-30 17:11

    if 'content-type' suffices, you can issue an HTTP 'Head' request instead of 'Get', to just receive the HTTP headers.

    import requests
    
    url = 'http://www.december.com/html/demo/hello.html'
    response = requests.head(url)
    print response.headers['content-type']
    
    0 讨论(0)
  • 2020-12-30 17:21

    Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False. That option has since been renamed to stream and the boolean value is inverted, so you want stream=True.

    The original answer follows.


    Once you use iter_content(), you have to continue using it; .text indirectly uses the same interface under the hood (via .content).

    In other words, by using iter_content() at all, you have to do the work .text does by hand:

    from requests.compat import chardet
    
    r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
    peek = r.iter_content(256).next()
    mime = magic.from_buffer(peek, mime=True)
    
    if mime == "text/html":
        contents = peek + b''.join(r.iter_content(10 * 1024))
        encoding = r.encoding
        if encoding is None:
            # detect encoding
            encoding = chardet.detect(contents)['encoding']
        try:
            textcontent = str(contents, encoding, errors='replace')
        except (LookupError, TypeError):
            textcontent = str(contents, errors='replace')
        print(textcontent)
    

    presuming you use Python 3.

    The alternative is to make 2 requests:

    r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
    mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
    
    if mime == "text/html":
         print(r.requests.get("http://www.december.com/html/demo/hello.html").text)
    

    Python 2 version:

    r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
    peek = r.iter_content(256).next()
    mime = magic.from_buffer(peek, mime=True)
    
    if mime == "text/html":
        contents = peek + ''.join(r.iter_content(10 * 1024))
        encoding = r.encoding
        if encoding is None:
            # detect encoding
            encoding = chardet.detect(contents)['encoding']
        try:
            textcontent = unicode(contents, encoding, errors='replace')
        except (LookupError, TypeError):
            textcontent = unicode(contents, errors='replace')
        print(textcontent)
    
    0 讨论(0)
提交回复
热议问题