Parameter expansion slow for large data sets

后端未结

关注

 2  1527

If I take the first 1,000 bytes from a file, Bash can replace some characters pretty quick

$ cut -b-1000 get_video_i


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2021-01-16 02:39
              
            
            
                                                                       
For the why, you can see the implementation of this code in pat_subst in subst.c in the bash source code. 

For each match in the string, the length of the string is counted numerous times (in pat_subst, match_pattern and match_upattern), both as a C string and more expensively as a multibyte string. This makes the function both slower than necessary, and more importantly, quadratic in complexity. 

This is why it's slow for larger input, and here's a pretty graph:



As for workarounds, just use sed. It's more likely to be optimized for string replacement operations (though you should be aware that POSIX only guarantees 8192 bytes per line, even though GNU sed handles arbitrarily large ones). 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2021-01-16 02:49
              
            
            
                                                                       
Originally, older shells and other utilities imposed LINE_MAX = 2048 
on file input for this kind of reason.  For huge variables bash has no 
problem parking them in memory. But substitution requires at least two 
concurrent copies. And lots of thrashing: as groups of characters are 
removed whole strings get rewritten.  Over and over and over.

There are tools meant for this - sed is a premiere choice.  bash is a 
distant second choice.  sed works on streams, bash works on memory blocks.

Another choice:
bash is extensible - your can write custom C code to stuff stuff well 
when bash was not meant to do it.  

CFA Johnson has good articles on how to do that:

Some ready to load builtins:

http://cfajohnson.com/shell/bash/loadables/

DIY builtins explained:

http://cfajohnson.com/shell/articles/dynamically-loadable/
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复