问题
I am working on a tool that generates random data for testing purposes. See below the part of my code that is giving me grief. This works perfectly and faster(takes around 20 seconds) than conventional solutions when the file is around 400MB, however, once it reaches around 500MB I am getting an out of memory error. How can I pull contents from the memory and write it in a file progressively having not more than 10 MB in my memory at a single time.
def createfile(filename,size_kb):
tbl = bytearray(range(256))
numrand = os.urandom(size_kb*1024)
with open(filename,"wb") as fh:
fh.write(numrand.translate(tbl))
createfile("file1.txt",500*1024)
Any help will be greatly appreciated
回答1:
comibining Jaco and mhawk and handling some float conversions.. here is the code that can generate Gbs of data in less than 10 seconds
def createfile(filename,size_kb):
chunksize = 1024
chunks = math.ceil(size_kb / chunksize)
with open(filename,"wb") as fh:
for iter in range(chunks):
numrand = os.urandom(int(size_kb*1024 / chunks))
fh.write(numrand)
numrand = os.urandom(int(size_kb*1024 % chunks))
fh.write(numrand)
Creates 1 Gb file in less than 8 seconds
回答2:
You can write out chunks of 10MB at a time, rather than generating the entire file in one go. As pointed out by @mhawke, the translate
call is redundant and can be removed:
def createfile(filename,size_kb):
chunks = size_kb /(1024*10)
with open(filename,"wb") as fh:
for iter in range(chunks):
numrand = os.urandom(size_kb*1024 / chunks)
fh.write(numrand)
numrand = os.urandom(size_kb*1024 % chunks)
fh.write(numrand)
createfile("c:/file1.txt",500*1024)
来源:https://stackoverflow.com/questions/35572348/how-to-write-an-huge-bytearray-to-file-progressively-without-hitting-memoryerror