I have multiple zip files that have the same structure -- they contain XML files at the root level. All files in each zip file are unique (no duplicates across the zip files). I
This is the shortest version I could come up with:
>>> import zipfile as z
>>> z1 = z.ZipFile('z1.zip', 'a')
>>> z2 = z.ZipFile('z2.zip', 'r')
>>> z1.namelist()
['a.xml', 'b.xml']
>>> z2.namelist()
['c.xml', 'd.xml']
>>> [z1.writestr(t[0], t[1].read()) for t in ((n, z2.open(n)) for n in z2.namelist())]
[None, None]
>>> z1.namelist()
['a.xml', 'b.xml', 'c.xml', 'd.xml']
>>> z1.close()
Without testing the alternative, to me this is the best (and probably most obvious too!) solution because - assuming both zip files contains the same amount of data, this method requires the decompression and re-compression of only half of it (1 file).
PS: List comprehension is there just to keep instructions on one line in the console (which speeds debugging up). Good pythonic code would require a proper for
loop, given that the resulting list serves no purpose...
HTH!
Here's what I came up with, thanks to @mac. Note that the way this is currently implemented the first zip file is modified to contain all the files from the other zip files.
import zipfile as z
zips = ['z1.zip', 'z2.zip', 'z3.zip']
"""
Open the first zip file as append and then read all
subsequent zip files and append to the first one
"""
with z.ZipFile(zips[0], 'a') as z1:
for fname in zips[1:]:
zf = z.ZipFile(fname, 'r')
for n in zf.namelist():
z1.writestr(n, zf.open(n).read())