Downloading and unzipping a .zip file without writing to disk

前端 未结 9 1526
囚心锁ツ
囚心锁ツ 2020-12-02 04:42

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.

相关标签:
9条回答
  • 2020-12-02 05:34

    My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:

    # get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'
    
    import zipfile
    from StringIO import StringIO
    
    zipdata = StringIO()
    zipdata.write(get_zip_data())
    myzipfile = zipfile.ZipFile(zipdata)
    foofile = myzipfile.open('foo.txt')
    print foofile.read()
    
    # output: "hey, foo"
    

    Or more simply (apologies to Vishal):

    myzipfile = zipfile.ZipFile(StringIO(get_zip_data()))
    for name in myzipfile.namelist():
        [ ... ]
    

    In Python 3 use BytesIO instead of StringIO:

    import zipfile
    from io import BytesIO
    
    filebytes = BytesIO(get_zip_data())
    myzipfile = zipfile.ZipFile(filebytes)
    for name in myzipfile.namelist():
        [ ... ]
    
    0 讨论(0)
  • 2020-12-02 05:35

    Use the zipfile module. To extract a file from a URL, you'll need to wrap the result of a urlopen call in a BytesIO object. This is because the result of a web request returned by urlopen doesn't support seeking:

    from urllib.request import urlopen
    
    from io import BytesIO
    from zipfile import ZipFile
    
    zip_url = 'http://example.com/my_file.zip'
    
    with urlopen(zip_url) as f:
        with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
            foofile = myzipfile.open('foo.txt')
            print(foofile.read())
    

    If you already have the file downloaded locally, you don't need BytesIO, just open it in binary mode and pass to ZipFile directly:

    from zipfile import ZipFile
    
    zip_filename = 'my_file.zip'
    
    with open(zip_filename, 'rb') as f:
        with ZipFile(f) as myzipfile:
            foofile = myzipfile.open('foo.txt')
            print(foofile.read().decode('utf-8'))
    

    Again, note that you have to open the file in binary ('rb') mode, not as text or you'll get a zipfile.BadZipFile: File is not a zip file error.

    It's good practice to use all these things as context managers with the with statement, so that they'll be closed properly.

    0 讨论(0)
  • 2020-12-02 05:40

    Below is a code snippet I used to fetch zipped csv file, please have a look:

    Python 2:

    from StringIO import StringIO
    from zipfile import ZipFile
    from urllib import urlopen
    
    resp = urlopen("http://www.test.com/file.zip")
    zipfile = ZipFile(StringIO(resp.read()))
    for line in zipfile.open(file).readlines():
        print line
    

    Python 3:

    from io import BytesIO
    from zipfile import ZipFile
    from urllib.request import urlopen
    # or: requests.get(url).content
    
    resp = urlopen("http://www.test.com/file.zip")
    zipfile = ZipFile(BytesIO(resp.read()))
    for line in zipfile.open(file).readlines():
        print(line.decode('utf-8'))
    

    Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,

    resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
    zipfile = ZipFile(BytesIO(resp.read()))
    zipfile.namelist()
    # ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']
    
    0 讨论(0)
提交回复
热议问题