Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

好久不见. 提交于 2019-12-05 02:31:08
Omnifarious

That almost looks like something you would be feeding to pickle. Maybe something in the User-Agent string or Accepts header that urllib2 is sending is causing StackOverflow to send something other than JSON.

One telltale is to look at conn.headers.headers to see what the Content-Type header says.

And this question, Odd String Format Result from API Call, may have your answer. Basically, you might have to run your result through a gzip decompressor.

Double checking with this code:

>>> req = urllib2.Request("http://api.stackoverflow.com/0.8/users/",
                          headers={'Accept-Encoding': 'gzip, identity'})
>>> conn = urllib2.urlopen(req)
>>> val = conn.read()
>>> conn.close()
>>> val[0:25]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ'

Yes, you are definitely getting gzip encoded data back.

Since you seem to be getting different results on different machines with the same version of Python, and in general it looks like the urllib2 API would require you do something special to request gzip encoded data, my guess is that you have a transparent proxy in there someplace.

I saw a presentation by the EFF at CodeCon in 2009. They were doing end-to-end connectivity testing to discover dirty ISP tricks of various kinds. One of the things they discovered while doing this testing is that a surprising number of consumer level NAT routers add random HTTP headers or do transparent proxying. You might have some piece of equipment on your network that's adding or modifying the Accept-Encoding header in order to make your connection seem faster.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!