how to use pycurl if requested data is sometimes gzipped, sometimes not?

守給你的承諾、 提交于 2019-12-11 02:52:43

问题


I'm doing this to fetch some data:

c = pycurl.Curl()
c.setopt(pycurl.ENCODING, 'gzip') 
c.setopt(pycurl.URL, url)
c.setopt(pycurl.TIMEOUT, 10)   
c.setopt(pycurl.FOLLOWLOCATION, True)

xml = StringIO()

c.setopt(pycurl.WRITEFUNCTION, xml.write )

c.perform()
c.close()

My urls are typically of this sort:

http://host/path/to/resource-foo.xml

Usually I get back 302 pointing to:

http://archive-host/path/to/resource-foo.xml.gz

Given that I have set FOLLOWLOCATION, and ENCODING gzip, everything works great.

The problem is, sometimes I have a URL which does not result in a redirect to a gzipped resource. When this happens, c.perform() throws this error:

pycurl.error: (61, 'Error while processing content unencoding: invalid block type')

Which suggests to me that pycurl is trying to gunzip a resource that is not gzipped.

Is there some way I can instruct pycurl to figure out the response encoding, and gunzip or not as appropriate? I have played around with using different values for ENCODING, but so far no beans.

The pycurl docs seems to be a little lacking. :/

thx!


回答1:


If worst comes to worst, you could omit the ENCODING 'gzip', set HTTPHEADER to {'Accept-Encoding' : 'gzip'}, check the response headers for "Content-Encoding: gzip" and if it's present, gunzip the response yourself.



来源:https://stackoverflow.com/questions/758243/how-to-use-pycurl-if-requested-data-is-sometimes-gzipped-sometimes-not

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!