Regex matching in a Bash if statement

后端 未结 3 906
逝去的感伤
逝去的感伤 2020-11-28 01:42

What did I do wrong here?

Trying to match any string that contains spaces, lowercase, uppercase, or numbers. Special characters would be nice too, but I think that r

相关标签:
3条回答
  • 2020-11-28 02:32

    In case someone wanted an example using variables...

    #!/bin/bash
    
    # Only continue for 'develop' or 'release/*' branches
    BRANCH_REGEX="^(develop$|release//*)"
    
    if [[ $BRANCH =~ $BRANCH_REGEX ]];
    then
        echo "BRANCH '$BRANCH' matches BRANCH_REGEX '$BRANCH_REGEX'"
    else
        echo "BRANCH '$BRANCH' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
    fi
    
    0 讨论(0)
  • 2020-11-28 02:42

    There are a couple of important things to know about bash's [[ ]] construction. The first:

    Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

    The second thing:

    An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.

    Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

    So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]

    Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

    pat="[0-9a-zA-Z ]"
    if [[ $x =~ $pat ]]; then ...
    

    For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

    # This doesn't work:
    if [[ $x =~ "$pat" ]]; then ...
    

    Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

    valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
    if [[ ! $x =~ [^$valid] ]]; then ...
    

    ! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

    The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])

    0 讨论(0)
  • 2020-11-28 02:43

    I'd prefer to use [:punct:] for that. Also, a-zA-Z09-9 could be just [:alnum:]:

    [[ $TEST =~ ^[[:alnum:][:blank:][:punct:]]+$ ]]
    
    0 讨论(0)
提交回复
热议问题