How to handle response encoding from urllib.request.urlopen()

后端 未结 7 2133
一向
一向 2020-11-27 03:52

I\'m trying to search a webpage using regular expressions, but I\'m getting the following error:

TypeError: can\'t use a string pattern on a bytes-lik

相关标签:
7条回答
  • 2020-11-27 04:28

    You just need to decode the response, using the Content-Type header typically the last value. There is an example given in the tutorial too.

    output = response.decode('utf-8')
    
    0 讨论(0)
  • 2020-11-27 04:42

    With requests:

    import requests
    
    response = requests.get(URL).text
    
    0 讨论(0)
  • 2020-11-27 04:43

    after you make a request req = urllib.request.urlopen(...) you have to read the request by calling html_string = req.read() that will give you the string response that you can then parse the way you want.

    0 讨论(0)
  • 2020-11-27 04:46

    I had the same issues for the last two days. I finally have a solution. I'm using the info() method of the object returned by urlopen():

    req=urllib.request.urlopen(URL)
    charset=req.info().get_content_charset()
    content=req.read().decode(charset)
    
    0 讨论(0)
  • 2020-11-27 04:50

    As for me, the solution is as following (python3):

    resource = urllib.request.urlopen(an_url)
    content =  resource.read().decode(resource.headers.get_content_charset())
    
    0 讨论(0)
  • 2020-11-27 04:51
    urllib.urlopen(url).headers.getheader('Content-Type')
    

    Will output something like this:

    text/html; charset=utf-8

    0 讨论(0)
提交回复
热议问题