Using a glob expression passed as a bash script argument

后端未结
关注
 1  557
爱一瞬间的悲伤 2021-01-21 19:57
TL;DR:

Why isn\'t invoking ./myscript foo* when myscript has var=$1 the same as invoking ./myscript with var=

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   走了就别回头了
                                             
                
                
                (楼主)
            
              
              
                2021-01-21 20:45
              

            
            
                        
Addressing the "why"

Assignments, as in var=foo*, don't expand globs -- that is, when you run var=foo*, the literal string foo* is put into the variable foo, not the list of files matching foo*.

By contrast, unquoted use of foo* on a command line expands the glob, replacing it with a list of individual names, each of which is passed as a separate argument.

Thus, running ./yourscript foo* doesn't pass foo* as $1 unless no files matching that glob expression exist; instead, it becomes something like ./yourscript foo01 foo02 foo03, with each argument in a different spot on the command line.

The reason running ./yourscript "foo*" functions as a workaround is the unquoted expansion inside the script allowing the glob to be expanded at that later time. However, this is bad practice: glob expansion happens concurrent with string-splitting (meaning that relying on this behavior removes your ability to pass filenames containing characters found in IFS, typically whitespace), and also means that you can't pass literal filenames when they could also be interpreted as globs (if you have a file named [1] and a file named 1, passing [1] would always be replaced with 1).



Idiomatic Usage

The idiomatic way to build this would be to shift away the first argument, and then iterate over subsequent ones, like so:

#!/bin/bash
out_base=$1; shift

shopt -s nullglob                 # avoid generating an error if a directory has no .status

for dir; do                       # iterate over directories passed in $2, $3, etc
  for file in "$dir"/*.status; do # iterate over files ending in .status within those
      grep -e "string" "$file"    # match a single file
  done
done >"${out_base}.extension"




If you have many .status files in a single directory, all this can be made more efficient by using find to invoke grep with as many arguments as possible, rather than calling grep individually on a per-file basis:

#!/bin/bash
out_base=$1; shift

find "$@" -maxdepth 1 -type f -name '*.status' \
  -exec grep -h -- /dev/null '{}' + \
  >"${out_base}.extension"




Both scripts above expect the globs passed not to be quoted on the invoking shell. Thus, usage is of the form:

# being unquoted, this expands the glob into a series of separate arguments
your_script descriptor dir_*_map


This is considerably better practice than passing globs to your script (which then is required to expand them to retrieve the actual files to use); it works correctly with filenames containing whitespace (which the other practice doesn't), and files whose names are themselves glob expressions.



Some other points of note:


Always put double quotes around expansions! Failing to do so results in the additional steps of string-splitting and glob expansion (in that order) being applied. If you want globbing, as in the case of "$dir"/*.status, then end the quotes before the glob expression starts.
for dir; do is precisely equivalent to for dir in "$@"; do, which iterates over arguments. Don't make the mistake of using for dir in $*; do or for dir in $@; do instead! These latter invocations combine each element of the list with the first character of IFS (which, by default, contains the space, the tab and the newline in that order), then splits the resulting string on any IFS characters found within, then expands each component of the resulting list as a glob.
Passing /dev/null as an argument to grep is a safety measure: It ensures that you don't have different behavior between the single-argument and multi-argument cases (as an example, grep defaults to printing filenames within output only when passed multiple arguments), and ensures that you can't have grep hang trying to read from stdin if it's passed no additional filenames at all (which find won't do here, but xargs can).
Using lower-case names for your own variables (as opposed to system- and shell-provided variables, which have all-uppercase names) is in accordance with POSIX-specified convention; see fourth paragraph of the POSIX specification regarding environment variables, keeping in mind that environment variables and shell variables share a namespace.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复