Extract substring in Bash

前端 未结 22 1873
别那么骄傲
别那么骄傲 2020-11-22 11:02

Given a filename in the form someletters_12345_moreleters.ext, I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I

相关标签:
22条回答
  • 2020-11-22 11:24

    I love sed's capability to deal with regex groups:

    > var="someletters_12345_moreletters.ext"
    > digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
    > echo $digits
    12345
    

    A slightly more general option would be not to assume that you have an underscore _ marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\+\([0-9]\+\).*/\1/p.


    > man sed | grep s/regexp/replacement -A 2
    s/regexp/replacement/
        Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
        refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
    

    More on this, in case you're not too confident with regexps:

    • s is for _s_ubstitute
    • [0-9]+ matches 1+ digits
    • \1 links to the group n.1 of the regex output (group 0 is the whole match, group 1 is the match within parentheses in this case)
    • p flag is for _p_rinting

    All escapes \ are there to make sed's regexp processing work.

    0 讨论(0)
  • 2020-11-22 11:24

    There's also the bash builtin 'expr' command:

    INPUT="someletters_12345_moreleters.ext"  
    SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
    echo $SUBSTRING
    
    0 讨论(0)
  • 2020-11-22 11:27

    Inklusive end, similar to JS and Java implementations. Remove +1 if you do not desire this.

    function substring() {
        local str="$1" start="${2}" end="${3}"
        
        if [[ "$start" == "" ]]; then start="0"; fi
        if [[ "$end"   == "" ]]; then end="${#str}"; fi
        
        local length="((${end}-${start}+1))"
        
        echo "${str:${start}:${length}}"
    } 
    

    Example:

        substring 01234 0
        01234
        substring 012345 0
        012345
        substring 012345 0 0
        0
        substring 012345 1 1
        1
        substring 012345 1 2
        12
        substring 012345 0 1
        01
        substring 012345 0 2
        012
        substring 012345 0 3
        0123
        substring 012345 0 4
        01234
        substring 012345 0 5
        012345
    

    More example calls:

        substring 012345 0
        012345
        substring 012345 1
        12345
        substring 012345 2
        2345
        substring 012345 3
        345
        substring 012345 4
        45
        substring 012345 5
        5
        substring 012345 6
        
        substring 012345 3 5
        345
        substring 012345 3 4
        34
        substring 012345 2 4
        234
        substring 012345 1 3
        123
    
    0 讨论(0)
  • 2020-11-22 11:29

    If x is constant, the following parameter expansion performs substring extraction:

    b=${a:12:5}
    

    where 12 is the offset (zero-based) and 5 is the length

    If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:

    tmp=${a#*_}   # remove prefix ending in "_"
    b=${tmp%_*}   # remove suffix starting with "_"
    

    If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.

    Both solutions presented are pure bash, with no process spawning involved, hence very fast.

    0 讨论(0)
  • 2020-11-22 11:30

    I'm surprised this pure bash solution didn't come up:

    a="someletters_12345_moreleters.ext"
    IFS="_"
    set $a
    echo $2
    # prints 12345
    

    You probably want to reset IFS to what value it was before, or unset IFS afterwards!

    0 讨论(0)
  • 2020-11-22 11:32

    Without any sub-processes you can:

    shopt -s extglob
    front=${input%%_+([a-zA-Z]).*}
    digits=${front##+([a-zA-Z])_}
    

    A very small variant of this will also work in ksh93.

    0 讨论(0)
提交回复
热议问题