Python urllib cache | 易学教程

I'm writing a script in Python that should determine if it has internet access.

import urllib

CHECK_PAGE     = "http://64.37.51.146/check.txt"
CHECK_VALUE    = "true\n"
PROXY_VALUE    = "Privoxy"
OFFLINE_VALUE  = ""

page = urllib.urlopen(CHECK_PAGE)
response = page.read()
page.close()

if response.find(PROXY_VALUE) != -1:
    urllib.getproxies = lambda x = None: {}
    page = urllib.urlopen(CHECK_PAGE)
    response = page.read()
    page.close()

if response != CHECK_VALUE:
    print "'" + response + "' != '" + CHECK_VALUE + "'" # 
else:
    print "You are online!"

I use a proxy on my computer, so correct proxy handling is important. If it can't connect to the internet through the proxy, it should bypass the proxy and see if it's stuck at a login page (as many public hotspots I use do). With that code, if I am not connected to the internet, the first read() returns the proxy's error page. But when I bypass the proxy after that, I get the same page. If I bypass the proxy BEFORE making any requests, I get an error like I should. I think Python is caching the page from the 1st time around.

How do I force Python to clear its cache (or is this some other problem)?

You want

page = urllib.urlopen(CHECK_PAGE, proxies={})

Remove the

urllib.getproxies = lambda x = None: {}

line.

Call urllib.urlcleanup() before each call of urllib.urlopen() will solve the problem. Actually, urllib.urlopen will call urlretrive() function which creates a cache to hold data, and urlcleanup() will remove it.

来源：https://stackoverflow.com/questions/6757168/python-urllib-cache

标签

python

urllib