问题
How come my regex pattern isn't lazy? It should be capturing the first number, not the second.
Here is a working bash script..
#!/bin/bash
text='here is some example text I want to match word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB'
regex='(word1|word2).*?number[[:blank:]]([0-9.]+) GiB'
if [[ "$text" =~ $regex ]]; then
echo 'FULL MATCH: '"${BASH_REMATCH[0]}"
echo 'NUMBER CAPTURE: '"${BASH_REMATCH[2]}"
fi
Here is the output...
FULL MATCH: word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB
NUMBER CAPTURE: 1.89
Using this online POSIX regex tester it is lazy as I expected. But in Bash it is greedy. The NUMBER CAPTURE should be 3.01, not 1.89.
回答1:
Wrt .*?
, POSIX standard says
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
And concerning greedy matching, it says:
If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.
In this particular case you can use [^&]*
instead.
text='here is some example text I want to match word1 and this number 3.01 GiB here is some extra text and another number 1.89 GiB'
regex='(word1|word2)[^&]*number[[:blank:]]([0-9.]+) GiB'
if [[ "$text" =~ $regex ]]; then
echo 'FULL MATCH: '"${BASH_REMATCH[0]}";
echo 'NUMBER CAPTURE: '"${BASH_REMATCH[2]}";
fi
Outputs:
FULL MATCH: word1 and this number 3.01 GiB
NUMBER CAPTURE: 3.01
来源:https://stackoverflow.com/questions/57620201/how-come-my-regex-isnt-working-as-expected-in-bash-greedy-instead-of-lazy