Python equivalent of piping file output to gzip in Perl using a pipe

℡╲_俬逩灬. 提交于 2020-01-01 08:50:38

问题


I need to figure out how to write file output to a compressed file in Python, similar to the two-liner below:

open ZIPPED, "| gzip -c > zipped.gz";
print ZIPPED "Hello world\n";

In Perl, this uses Unix gzip to compress whatever you print to the ZIPPED filehandle to the file "zipped.gz".

I know how to use "import gzip" to do this in Python like this:

import gzip
zipped = gzip.open("zipped.gz", 'wb')
zipped.write("Hello world\n")

However, that is extremely slow. According to the profiler, using that method takes up 90% of my run time since I am writing 200GB of uncompressed data to various output files. I am aware that the file system could be part of the problem here, but I want to rule it out by using Unix/Linux compression instead. This is partially because I have heard that decompressing using this same module is slow as well.


回答1:


ChristopheD's suggestion of using the subprocess module is an appropriate answer to this question. However, it's not clear to me that it will solve your performance problems. You would have to measure the performance of the new code to be sure.

To convert your sample code:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.communicate("Hello World\n")

Since you need to send large amounts of data to the sub-process, you should consider using the stdin attribute of the Popen object. For example:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.stdin.write("Some data")

# Write more data here...

p.communicate() # Finish writing data and wait for subprocess to finish

You may also find the discussion at this question helpful.




回答2:


Try something like this:

from subprocess import Popen, PIPE
f = open('zipped.gz', 'w')
pipe = Popen('gzip', stdin=PIPE, stdout=f)
pipe.communicate('Hello world\n')
f.close()



回答3:


Using the gzip module is the official one-way-to-do-it and it's unlikely that any other pure python approach will go faster. This is especially true because the size of your data rules out in-memory options. Most likely, the fastest way is to write the full file to disk and use subprocess to call gz on that file.



来源:https://stackoverflow.com/questions/8302911/python-equivalent-of-piping-file-output-to-gzip-in-perl-using-a-pipe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!