问题
I want to grep the shortest match and the pattern should be something like:
<car ... model=BMW ...>
...
...
...
</car>
... means any character and the input is multiple lines.
回答1:
You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ?
after the quantifier. For example you can change .*
to .*?
.
By default grep
doesn't support non-greedy modifiers, but you can use grep -P
to use the Perl syntax.
回答2:
Actualy the .*?
only works in perl
. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P
would work but grep -E
which is same as egrep
would not work (it would be greedy).
See also: http://blog.vinceliu.com/2008/02/non-greedy-regular-expression-matching.html
回答3:
My grep that works after trying out stuff in this thread:
echo "hi how are you " | grep -shoP ".*? "
Just make sure you append a space to each one of your lines
(Mine was a line by line search to spit out words)
回答4:
grep
For non-greedy match in grep
you could use a negated character class. In other words, try to avoid wildcards.
For example, to fetch all links to jpeg files from the page content, you'd use:
grep -o '"[^" ]\+.jpg"'
To deal with multiple line, pipe the input through xargs
first. For performance, use ripgrep.
回答5:
The short answer is using the next regular expression:
(?s)<car .*? model=BMW .*?>.*?</car>
- (?s) - this makes a match across multiline
- .*? - matches any character, a number of times in a lazy way (minimal match)
A (little) more complicated answer is:
(?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>
This will makes possible to match car1 and car2 in the following text
<car1 ... model=BMW ...>
...
...
...
</car1>
<car2 ... model=BMW ...>
...
...
...
</car2>
- (..) represents a capturing group
- \1 in this context matches the sametext as most recently matched by capturing group number 1
来源:https://stackoverflow.com/questions/3027518/how-to-do-a-non-greedy-match-in-grep