Extract zip to memory, parse contents

前端 未结 4 804
孤独总比滥情好
孤独总比滥情好 2021-01-22 21:30

I want to read the contents of a zip file into memory rather than extracting them to disc, find a particular file in the archive, open the file and extract a line from it.

相关标签:
4条回答
  • 2021-01-22 22:11

    Don't overthink it. It Just Works:

    import zipfile
    
    # 1) I want to read the contents of a zip file ...
    with zipfile.ZipFile('A-Zip-File.zip') as zipper:
      # 2) ... find a particular file in the archive, open the file ...
      with zipper.open('A-Particular-File.txt') as fp:
        # 3) ... and extract a line from it.
        first_line = fp.readline()
    
    print first_line
    
    0 讨论(0)
  • 2021-01-22 22:11

    Thank you to everyone that contributed solutions. This is what ended up working for me:

    zfile = ZipFile('name.zip', 'r')
    
            for name in zfile.namelist():
                if fnmatch.fnmatch(name, '*_readme.xml'):
                    zopen = zfile.open(name)
                    for line in zopen:
                        if re.match('(.*)<foo>(.*)</foo>(.*)', line):
                            print line
    
    0 讨论(0)
  • 2021-01-22 22:15

    IMO just using read is enough:

    zfile = ZipFile('name.zip', 'r')
    files = []
    for name in zfile.namelist():
      if fnmatch.fnmatch(name, '*_readme.xml'):
        files.append(zfile.read(name))
    

    This will make a list with contents of files that match the pattern.

    Test: You can then parse contents afterwards by iterating through the list:

    for file in files:
      print(file[0:min(35,len(file))].decode()) # "parsing"
    

    Or better use a functor:

    import zipfile as zip
    import os
    import fnmatch
    
    zip_name = os.sys.argv[1]
    zfile = zip.ZipFile(zip_name, 'r')
    
    def parse(contents, member_name = ""):
      if len(member_name) > 0:
        print( "Parsed `{}`:".format(member_name) )  
      print(contents[0:min(35, len(contents))].decode()) # "parsing"
    
    for name in zfile.namelist():
      if fnmatch.fnmatch(name, '*.cpp'):
        parse(zfile.read(name), name)
    

    This way there is no data kept in memory for no reason and memory foot print is smaller. It might be important if the files are big.

    0 讨论(0)
  • 2021-01-22 22:16

    The question you link shows you that you need to read the file. Depending on your use case that may already be enough. In your code you replace the loop variable holding a filename with an empty string buffer. Try something like this:

    zfile = ZipFile('name.zip', 'r')
    
    for name in zfile.namelist():
        if fnmatch.fnmatch(name, '*_readme.xml'):
            ex_file = zfile.open(name) # this is a file like object
            content = ex_file.read() # now file-contents are a single string
    

    If you really want a buffer that you can manipulate, then simply instantiate it with the contents:

    buf = StringIO(zfile.open(name).read())
    

    You may also want to look at BytesIO and note that there are differences between Python 2 and 3.

    0 讨论(0)
提交回复
热议问题