Downloading pdf files using mechanize and urllib

戏子无情 提交于 2019-12-06 08:41:14

The documentation for urllib says this about the urlretrieve function:

The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name).

The function's return value has the location of the file:

Return a tuple (filename, headers) where filename is the local file name under which the object can be found, and headers is whatever the info() method of the object returned by urlopen() returned (for a remote object, possibly cached).

So, change this line:

urllib.urlretrieve(path)

to this:

(filename, headers) = urllib.urlretrieve(path)

and the path in filename will have the location. Optionally, pass in the filename argument to urlretrieve to specify the location yourself.

I've never used mechanize, but from the documentation for urllib at http://docs.python.org/library/urllib.html:

urllib.urlretrieve(url[, filename[, reporthook[, data]]])

Copy a network object denoted by a URL to a local file, if necessary. If the URL points to a local file, or a valid cached copy of the object exists, the object is not copied. Return a tuple (filename, headers) where filename is the local file name under which the object can be found, and headers is whatever the info() method of the object returned by urlopen() returned (for a remote object, possibly cached). Exceptions are the same as for urlopen().

As you can see the urlretrieve function saves to a temporary file if you don't specify one. So try specifying the filename as you suggested in your second piece of code. Otherwise you could call urlretrieve like this:

    saved_filename,headers = urllib.urlretrieve(path)

and then use saved_filename later on.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!