Shell read sometimes strips trailing delimiter

后端未结

关注

 2  1659

To parse colon-delimited fields I can use read with a custom IFS:

$ echo \'foo.c:41:switch (color) {\' | { IFS=: read file line tex


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-01-02 00:10
              
            
            
                                                                       
Yes, that's standard behaviour (see the read specification and Field Splitting). A few shells (ash-based including dash, pdksh-based, zsh, yash at least) used not to do it, but except for zsh (when not in POSIX mode), busybox sh, most of them have been updated for POSIX compliance.

That's the same for:

$ var='a:b:c:' IFS=:
$ set -- $var; echo "$#"
3


(see how the POSIX specification for read actually defers to the Field Splitting mechanism where a:b:c: is split into 3 fields, and so with IFS=: read -r a b c, there are as many fields as variables).

The rationale is that in ksh (on which the POSIX spec is based) $IFS (initially in the Bourne shell the internal field separator) became a field  delimiter, I think so any list of elements (not containing the delimiter) could be represented.

When $IFS is a separator, one can't represent a list of one empty element ("" is split into a list of 0 element, ":" into a list of two empty elements¹). When it's a delimiter, you can express a list of zero element with "", or one empty element with ":", or two empty elements with "::".

It's a bit unfortunate as one of the most common usages of $IFS is to split $PATH. And a $PATH like /bin:/usr/bin: is meant to be split into "/bin", "/usr/bin", "", not just "/bin" and "/usr/bin".

Now, with POSIX shells (but not all shells are compliant in that regard), for word splitting upon parameter expansion, that can be worked around with:

IFS=:; set -o noglob
for dir in $PATH""; do
  something with "${dir:-.}"
done


That  trailing "" makes sure that if $PATH ends in a trailing :, an extra empty element is added. And also that an empty $PATH is treated as one empty element as it should be.

That approach can't be used for read though.

Short of switching to zsh, there's no easy work around other than inserting an extra : and remove it afterwards like:

echo a:b:c: | sed 's/:/::/2' | { IFS=: read -r x y z; z=${z#:}; echo "$z"; }


Or (less portable):

echo a:b:c: | paste -d: - /dev/null | { IFS=: read -r x y z; z=${z%:}; echo "$z"; }


I've also added the -r which you generally want when using read.

Most likely here you'd want to use a proper text processing utility like sed/awk/perl instead of writing convoluted and probably inefficient code around read which has not been designed for that.



^{¹ Though in the Bourne shell, that was still split into zero elements as there was no distinction between IFS-whitespace and IFS-non-whitespace characters there, something that was also added by ksh}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-02 00:20
              
            
            
                                                                       
One "feature" of read is that it will strip leading and trailing whitespace separators in the variables it populates - it is explained in much more detail at the linked answer. This enables beginners to have read do what they expect when doing for example read first rest <<< ' foo  bar ' (note the extra spaces).

The take-away? It is hard to do accurate text processing using Bash and shell tools. If you want full control it's probably better to use a "stricter" language like for example Python, where split() will do what you want, but where you might have to dig much deeper into string handling to explicitly remove newline separators or handle encoding.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

Shell read *sometimes* strips trailing delimiter

Shell read sometimes strips trailing delimiter