错误如下:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
抓取的网页检查:
Content-Encoding: gzip
需要做gzip的解压
request = urllib.request.Request(url = url, headers = request_headers) reponse = urllib.request.urlopen(request,timeout = timeout) data = reponse.read() buff = BytesIO(data) f = gzip.GzipFile(fileobj=buff) res = f.read().decode('utf-8') print(res)
在请求的头部加入:"Accept-Encoding":"gzip",
如果是下面:则每次返回有可能是gzip压缩,有可能不压缩,WEB 应用干脆为了迁就 IE 直接输出原始 DEFLATE
Accept-Encoding: gzip, deflate在请求的头部加入:
"Accept-Encoding":"gzip",
来源:https://www.cnblogs.com/zhijiancanxue/p/12543783.html