Downloading and unzipping a .zip file without writing to disk

前端未结

关注

 9  1526

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

相关标签:

9条回答

不知归路

2020-12-02 05:34

My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:

# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'

import zipfile
from StringIO import StringIO

zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()

# output: "hey, foo"

Or more simply (apologies to Vishal):

myzipfile = zipfile.ZipFile(StringIO(get_zip_data()))
for name in myzipfile.namelist():
    [ ... ]

In Python 3 use BytesIO instead of StringIO:

import zipfile
from io import BytesIO

filebytes = BytesIO(get_zip_data())
myzipfile = zipfile.ZipFile(filebytes)
for name in myzipfile.namelist():
    [ ... ]

0 讨论(0)

我寻月下人不归

2020-12-02 05:35
Use the zipfile module. To extract a file from a URL, you'll need to wrap the result of a urlopen call in a BytesIO object. This is because the result of a web request returned by urlopen doesn't support seeking:
```
from urllib.request import urlopen

from io import BytesIO
from zipfile import ZipFile

zip_url = 'http://example.com/my_file.zip'

with urlopen(zip_url) as f:
    with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
        foofile = myzipfile.open('foo.txt')
        print(foofile.read())
```
If you already have the file downloaded locally, you don't need BytesIO, just open it in binary mode and pass to ZipFile directly:
```
from zipfile import ZipFile

zip_filename = 'my_file.zip'

with open(zip_filename, 'rb') as f:
    with ZipFile(f) as myzipfile:
        foofile = myzipfile.open('foo.txt')
        print(foofile.read().decode('utf-8'))
```
Again, note that you have to open the file in binary ('rb') mode, not as text or you'll get a zipfile.BadZipFile: File is not a zip file error.

It's good practice to use all these things as context managers with the with statement, so that they'll be closed properly.
0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-12-02 05:40

Below is a code snippet I used to fetch zipped csv file, please have a look:

Python 2:

from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen

resp = urlopen("http://www.test.com/file.zip")
zipfile = ZipFile(StringIO(resp.read()))
for line in zipfile.open(file).readlines():
    print line

Python 3:

from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
# or: requests.get(url).content

resp = urlopen("http://www.test.com/file.zip")
zipfile = ZipFile(BytesIO(resp.read()))
for line in zipfile.open(file).readlines():
    print(line.decode('utf-8'))

Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,

resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
zipfile = ZipFile(BytesIO(resp.read()))
zipfile.namelist()
# ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

0 讨论(0)

上一页 1 2