Extract substring in Bash

前端 未结 22 1893
别那么骄傲
别那么骄傲 2020-11-22 11:02

Given a filename in the form someletters_12345_moreleters.ext, I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I

相关标签:
22条回答
  • 2020-11-22 11:33

    Here's how i'd do it:

    FN=someletters_12345_moreleters.ext
    [[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
    

    Explanation:

    Bash-specific:

    • [[ ]] indicates a conditional expression
    • =~ indicates the condition is a regular expression
    • && chains the commands if the prior command was successful

    Regular Expressions (RE): _([[:digit:]]{5})_

    • _ are literals to demarcate/anchor matching boundaries for the string being matched
    • () create a capture group
    • [[:digit:]] is a character class, i think it speaks for itself
    • {5} means exactly five of the prior character, class (as in this example), or group must match

    In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _, the condition is successful, the capture group is made available in BASH_REMATCH, and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _. e.g. if FN where _1 _12 _123 _1234 _12345_, there would be four false starts before it found a match.

    0 讨论(0)
  • 2020-11-22 11:33

    If we focus in the concept of:
    "A run of (one or several) digits"

    We could use several external tools to extract the numbers.
    We could quite easily erase all other characters, either sed or tr:

    name='someletters_12345_moreleters.ext'
    
    echo $name | sed 's/[^0-9]*//g'    # 12345
    echo $name | tr -c -d 0-9          # 12345
    

    But if $name contains several runs of numbers, the above will fail:

    If "name=someletters_12345_moreleters_323_end.ext", then:

    echo $name | sed 's/[^0-9]*//g'    # 12345323
    echo $name | tr -c -d 0-9          # 12345323
    

    We need to use regular expresions (regex).
    To select only the first run (12345 not 323) in sed and perl:

    echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
    perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'
    

    But we could as well do it directly in bash(1) :

    regex=[^0-9]*([0-9]{1,}).*$; \
    [[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}
    

    This allows us to extract the FIRST run of digits of any length
    surrounded by any other text/characters.

    Note: regex=[^0-9]*([0-9]{5,5}).*$; will match only exactly 5 digit runs. :-)

    (1): faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.

    0 讨论(0)
  • 2020-11-22 11:35

    My answer will have more control on what you want out of your string. Here is the code on how you can extract 12345 out of your string

    str="someletters_12345_moreleters.ext"
    str=${str#*_}
    str=${str%_more*}
    echo $str
    

    This will be more efficient if you want to extract something that has any chars like abc or any special characters like _ or -. For example: If your string is like this and you want everything that is after someletters_ and before _moreleters.ext :

    str="someletters_123-45-24a&13b-1_moreleters.ext"
    

    With my code you can mention what exactly you want. Explanation:

    #* It will remove the preceding string including the matching key. Here the key we mentioned is _ % It will remove the following string including the matching key. Here the key we mentioned is '_more*'

    Do some experiments yourself and you would find this interesting.

    0 讨论(0)
  • 2020-11-22 11:35

    A little late, but I just ran across this problem and found the following:

    host:/tmp$ asd=someletters_12345_moreleters.ext 
    host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
    12345
    host:/tmp$ 
    

    I used it to get millisecond resolution on an embedded system that does not have %N for date:

    set `grep "now at" /proc/timer_list`
    nano=$3
    fraction=`expr $nano : '.*\(...\)......'`
    $debug nano is $nano, fraction is $fraction
    
    0 讨论(0)
  • 2020-11-22 11:37

    In case someone wants more rigorous information, you can also search it in man bash like this

    $ man bash [press return key]
    /substring  [press return key]
    [press "n" key]
    [press "n" key]
    [press "n" key]
    [press "n" key]
    

    Result:

    ${parameter:offset}
           ${parameter:offset:length}
                  Substring Expansion.  Expands to  up  to  length  characters  of
                  parameter  starting  at  the  character specified by offset.  If
                  length is omitted, expands to the substring of parameter  start‐
                  ing at the character specified by offset.  length and offset are
                  arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
                  offset  evaluates  to a number less than zero, the value is used
                  as an offset from the end of the value of parameter.  Arithmetic
                  expressions  starting  with  a - must be separated by whitespace
                  from the preceding : to be distinguished from  the  Use  Default
                  Values  expansion.   If  length  evaluates to a number less than
                  zero, and parameter is not @ and not an indexed  or  associative
                  array,  it is interpreted as an offset from the end of the value
                  of parameter rather than a number of characters, and the  expan‐
                  sion is the characters between the two offsets.  If parameter is
                  @, the result is length positional parameters beginning at  off‐
                  set.   If parameter is an indexed array name subscripted by @ or
                  *, the result is the length members of the array beginning  with
                  ${parameter[offset]}.   A  negative  offset is taken relative to
                  one greater than the maximum index of the specified array.  Sub‐
                  string  expansion applied to an associative array produces unde‐
                  fined results.  Note that a negative offset  must  be  separated
                  from  the  colon  by  at least one space to avoid being confused
                  with the :- expansion.  Substring indexing is zero-based  unless
                  the  positional  parameters are used, in which case the indexing
                  starts at 1 by default.  If offset  is  0,  and  the  positional
                  parameters are used, $0 is prefixed to the list.
    
    0 讨论(0)
  • 2020-11-22 11:39

    Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.

    filename=someletters_12345_moreletters.ext
    substring=${filename//@(+([a-z])_|_+([a-z]).*)}
    echo $substring
    12345
    
    0 讨论(0)
提交回复
热议问题