How do you read a file inside a zip file as text, not bytes?

前端 未结 4 772
我寻月下人不归
我寻月下人不归 2020-12-09 02:03

A simple program for reading a CSV file inside a zip file works in Python 2.7, but not in Python 3.2

$ cat test_zip_file_py3k.py 
import csv, sys, zipfile

z         


        
相关标签:
4条回答
  • 2020-12-09 02:09

    I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile work nicely with io.TextWrapper.

    Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.

    $ cat test_zip_file_py3k.py 
    import csv, io, sys, zipfile
    
    zip_file    = zipfile.ZipFile(sys.argv[1])
    items_file  = zip_file.open('items.csv', 'rU')
    # items_file.readable = lambda: True
    # items_file.writable = lambda: False
    # items_file.seekable = lambda: False
    # items_file.read1 = items_file.read
    items_file  = io.TextIOWrapper(items_file)
    
    for idx, row in enumerate(csv.DictReader(items_file)):
        print('Processing row {0} -- row = {1}'.format(idx, row))
    

    If I had to support py3k < 3.2, then I would go with the solution in my other answer.

    0 讨论(0)
  • 2020-12-09 02:14

    You can wrap it in a io.TextIOWrapper.

    items_file  = io.TextIOWrapper(items_file, encoding='your-encoding', newline='')
    

    Should work.

    0 讨论(0)
  • 2020-12-09 02:18

    Lennart's answer is on the right track (Thanks, Lennart, I voted up your answer) and it almost works:

    $ cat test_zip_file_py3k.py 
    import csv, io, sys, zipfile
    
    zip_file    = zipfile.ZipFile(sys.argv[1])
    items_file  = zip_file.open('items.csv', 'rU')
    items_file  = io.TextIOWrapper(items_file, encoding='iso-8859-1', newline='')
    
    for idx, row in enumerate(csv.DictReader(items_file)):
        print('Processing row {0}'.format(idx))
    
    $ python3.1 test_zip_file_py3k.py ~/data.zip
    Traceback (most recent call last):
      File "test_zip_file_py3k.py", line 7, in <module>
        items_file  = io.TextIOWrapper(items_file, 
                                       encoding='iso-8859-1', 
                                       newline='')
    AttributeError: readable
    

    The problem appears to be that io.TextWrapper's first required parameter is a buffer; not a file object.

    This appears to work:

    items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))
    

    This seems a little complex and also it seems annoying to have to read in a whole (perhaps huge) zip file into memory. Any better way?

    Here it is in action:

    $ cat test_zip_file_py3k.py 
    import csv, io, sys, zipfile
    
    zip_file    = zipfile.ZipFile(sys.argv[1])
    items_file  = zip_file.open('items.csv', 'rU')
    items_file  = io.TextIOWrapper(io.BytesIO(items_file.read()))
    
    for idx, row in enumerate(csv.DictReader(items_file)):
        print('Processing row {0}'.format(idx))
    
    $ python3.1 test_zip_file_py3k.py ~/data.zip
    Processing row 0
    Processing row 1
    Processing row 2
    ...
    Processing row 250
    
    0 讨论(0)
  • 2020-12-09 02:30

    And if you just like to read a file into a string:

    with ZipFile('spam.zip') as myzip:
        with myzip.open('eggs.txt') as myfile:
           eggs = myfile.read().decode('UTF-8'))
    
    0 讨论(0)
提交回复
热议问题