Python urllib cache

无人久伴 提交于 2019-12-20 05:46:11

问题


I'm writing a script in Python that should determine if it has internet access.

import urllib

CHECK_PAGE     = "http://64.37.51.146/check.txt"
CHECK_VALUE    = "true\n"
PROXY_VALUE    = "Privoxy"
OFFLINE_VALUE  = ""

page = urllib.urlopen(CHECK_PAGE)
response = page.read()
page.close()

if response.find(PROXY_VALUE) != -1:
    urllib.getproxies = lambda x = None: {}
    page = urllib.urlopen(CHECK_PAGE)
    response = page.read()
    page.close()

if response != CHECK_VALUE:
    print "'" + response + "' != '" + CHECK_VALUE + "'" # 
else:
    print "You are online!"

I use a proxy on my computer, so correct proxy handling is important. If it can't connect to the internet through the proxy, it should bypass the proxy and see if it's stuck at a login page (as many public hotspots I use do). With that code, if I am not connected to the internet, the first read() returns the proxy's error page. But when I bypass the proxy after that, I get the same page. If I bypass the proxy BEFORE making any requests, I get an error like I should. I think Python is caching the page from the 1st time around.

How do I force Python to clear its cache (or is this some other problem)?


回答1:


You want

page = urllib.urlopen(CHECK_PAGE, proxies={})

Remove the

urllib.getproxies = lambda x = None: {}

line.




回答2:


Call urllib.urlcleanup() before each call of urllib.urlopen() will solve the problem. Actually, urllib.urlopen will call urlretrive() function which creates a cache to hold data, and urlcleanup() will remove it.



来源:https://stackoverflow.com/questions/6757168/python-urllib-cache

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!