python urllib2.urlopen(url) process block

[亡魂溺海] 提交于 2019-12-12 09:27:22

问题


I am using urllib2.urlopen() and my process is getting blocked

I am aware that urllib2.urlopen() has default timeout.

How to make the call unblockable?

The backtrace is

(gdb) bt 
#0 0x0000003c6200dc35 in recv () from /lib64/libpthread.so.0 
#1 0x00002b88add08137 in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so 
#2 0x00002b88add0830e in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so 
#3 0x000000310b2d8e19 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0

回答1:


If your problem is that you need to urllib to finish reading

read() operation is blocking operation in Python.

If you want to create asynchronous requests

  • Do reading in non-main thread http://docs.python.org/library/threading.html

  • Use requestslibrary and asynchronous requests http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests

If your problem is need to set timeout

Again, use requests library as mentioned above.




回答2:


You can try using strace (or similar) tool to figure out what the actual system call is that is blocking your python script, e.g on linux: $ strace python yourscript.py

yourscript.py:

from urllib2 import urlopen
urlopen("http://somesite.local/foobar.html")

$ strace python yourscript.py

... lots of system call stripped ...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16


来源:https://stackoverflow.com/questions/11664185/python-urllib2-urlopenurl-process-block

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!