Python equivalent of piping file output to gzip in Perl using a pipe

后端未结

关注

 5  749

I need to figure out how to write file output to a compressed file in Python, similar to the two-liner below:

open ZIPPED, \"| gzip -c > zipped.gz\";
print ZI


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-02-14 02:17
              
            
            
                                                                       
Make sure you use the same compression level when comparing speeds.  By default, linux gzip uses level 6, while python uses level 9.  I tested this in Python 3.6.8 using gzip version 1.5, compressing 600MB of data from MySQL dump.  With default settings:
python module takes 9.24 seconds and makes a file 47.1 MB

subprocess gzip takes 8.61 seconds and makes a file 48.5 MB
After changing it to level 6 so they match:

python module takes 8.09 seconds and makes a file 48.6 MB

subprocess gzip takes 8.55 seconds and makes a file 48.5 MB
# subprocess method
start = time.time()
with open(outfile, 'wb') as f:
    subprocess.run(['gzip'], input=dump, stdout=f, check=True)
print('subprocess finished after {:.2f} seconds'.format(time.time() - start))

# gzip method
start = time.time()
with gzip.open(outfile2, 'wb', compresslevel=6) as z:
    z.write(dump)
print('gzip module finished after {:.2f} seconds'.format(time.time() - start))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2021-02-14 02:18
              
            
            
                                                                       
ChristopheD's suggestion of using the subprocess module is an appropriate answer to this question. However, it's not clear to me that it will solve your performance problems. You would have to measure the performance of the new code to be sure.

To convert your sample code:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.communicate("Hello World\n")


Since you need to send large amounts of data to the sub-process, you should consider using the stdin attribute of the Popen object. For example:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.stdin.write("Some data")

# Write more data here...

p.communicate() # Finish writing data and wait for subprocess to finish


You may also find the discussion at this question helpful.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2021-02-14 02:30
              
            
            
                                                                       
In addition to @srgerg's answer I want to apply same approach by disabling shell option shell=False, which is also done on @Moishe Lettvin's answer and recommended on (https://stackoverflow.com/a/3172488/2402577). 

import subprocess
def zip():
    f = open("zipped.gz", "w")
    p1 = subprocess.Popen(["echo", "Hello World"], stdout=subprocess.PIPE)
    p2 = subprocess.Popen(["gzip", "-9c"], stdin=p1.stdout, stdout=f)
    p1.stdout.close()
    p2.communicate()
    f.close()




Please not that originally I am using this p1s output for git diff as:

p1 = subprocess.Popen(["git", "diff"], stdout=subprocess.PIPE)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-02-14 02:36
              
            
            
                                                                       
Try something like this:

from subprocess import Popen, PIPE
f = open('zipped.gz', 'w')
pipe = Popen('gzip', stdin=PIPE, stdout=f)
pipe.communicate('Hello world\n')
f.close()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-02-14 02:39
              
            
            
                                                                       
Using the gzip module is the official one-way-to-do-it and it's unlikely that any other pure python approach will go faster.  This is especially true because the size of your data rules out in-memory options.  Most likely, the fastest way is to write the full file to disk and use subprocess to call gz on that file.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复