问题
Is it a way in Python (2.7 preferably) to compress a file in several equally-sized .zip
files??
The result would be something like: (lets assume 200MB selected and compressing a file of 1100MB)
compressed_file.zip.001 (200MB)
compressed_file.zip.002 (200MB)
compressed_file.zip.003 (200MB)
compressed_file.zip.004 (200MB)
compressed_file.zip.005 (200MB)
compressed_file.zip.006 (100MB)
回答1:
I think you can do it in shell command. Somthing like
gzip -c /path/to/your/large/file | split -b 150000000 - compressed.gz
and you can execute shell from python.
Regards
Ganesh J
回答2:
NB: This is based on assumption that the result is just a chopped up ZIP file without any extra headers or anything.
If you check the docs, ZipFile can be passed a file-like object to use for the I/O. Hence, we should be able to give it our own object which implements the necessary subset of the protocol, and which splits the output into multiple files.
As it turns out, we only need to implement 3 functions:
- tell() -- just return number of bytes written so far
- write(str) -- write to file until max capacity, once full open a new file, repeat until all data written
- flush() -- flush the currently open file
Prototype Script
import random
import zipfile
def get_random_data(length):
return "".join([chr(random.randrange(256)) for i in range(length)])
class MultiFile(object):
def __init__(self, file_name, max_file_size):
self.current_position = 0
self.file_name = file_name
self.max_file_size = max_file_size
self.current_file = None
self.open_next_file()
@property
def current_file_no(self):
return self.current_position / self.max_file_size
@property
def current_file_size(self):
return self.current_position % self.max_file_size
@property
def current_file_capacity(self):
return self.max_file_size - self.current_file_size
def open_next_file(self):
file_name = "%s.%03d" % (self.file_name, self.current_file_no + 1)
print "* Opening file '%s'..." % file_name
if self.current_file is not None:
self.current_file.close()
self.current_file = open(file_name, 'wb')
def tell(self):
print "MultiFile::Tell -> %d" % self.current_position
return self.current_position
def write(self, data):
start, end = 0, len(data)
print "MultiFile::Write (%d bytes)" % len(data)
while start < end:
current_block_size = min(end - start, self.current_file_capacity)
self.current_file.write(data[start:start+current_block_size])
print "* Wrote %d bytes." % current_block_size
start += current_block_size
self.current_position += current_block_size
if self.current_file_capacity == self.max_file_size:
self.open_next_file()
print "* Capacity = %d" % self.current_file_capacity
def flush(self):
print "MultiFile::Flush"
self.current_file.flush()
mfo = MultiFile('splitzip.zip', 2**18)
zf = zipfile.ZipFile(mfo, mode='w', compression=zipfile.ZIP_DEFLATED)
for i in range(4):
filename = 'test%04d.txt' % i
print "Adding file '%s'..." % filename
zf.writestr(filename, get_random_data(2**17))
Trace Output
* Opening file 'splitzip.zip.001'...
Adding file 'test0000.txt'...
MultiFile::Tell -> 0
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 262102
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130990
MultiFile::Flush
Adding file 'test0001.txt'...
MultiFile::Tell -> 131154
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130948
MultiFile::Write (131112 bytes)
* Wrote 130948 bytes.
* Opening file 'splitzip.zip.002'...
* Capacity = 262144
* Wrote 164 bytes.
* Capacity = 261980
MultiFile::Flush
Adding file 'test0002.txt'...
MultiFile::Tell -> 262308
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 261938
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130826
MultiFile::Flush
Adding file 'test0003.txt'...
MultiFile::Tell -> 393462
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130784
MultiFile::Write (131112 bytes)
* Wrote 130784 bytes.
* Opening file 'splitzip.zip.003'...
* Capacity = 262144
* Wrote 328 bytes.
* Capacity = 261816
MultiFile::Flush
MultiFile::Tell -> 524616
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261770
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261758
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261712
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261700
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261654
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261642
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261596
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261584
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Tell -> 524848
MultiFile::Write (22 bytes)
* Wrote 22 bytes.
* Capacity = 261562
MultiFile::Write (0 bytes)
MultiFile::Flush
Directory Listing
-rw-r--r-- 1 2228 Feb 21 23:44 splitzip.py
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.001
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.002
-rw-r--r-- 1 582 Feb 22 00:07 splitzip.zip.003
Validation
>7z l splitzip.zip.001
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Listing archive: splitzip.zip.001
--
Path = splitzip.zip.001
Type = Split
Volumes = 3
----
Path = splitzip.zip
Size = 524870
--
Path = splitzip.zip
Type = zip
Physical Size = 524870
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2019-02-22 00:07:34 ..... 131072 131112 test0000.txt
2019-02-22 00:07:34 ..... 131072 131112 test0001.txt
2019-02-22 00:07:36 ..... 131072 131112 test0002.txt
2019-02-22 00:07:36 ..... 131072 131112 test0003.txt
------------------- ----- ------------ ------------ ------------------------
524288 524448 4 files, 0 folders
来源:https://stackoverflow.com/questions/54809238/compress-a-file-into-different-parts-in-python