Non-greedy matching with grep

瘦欲@ 提交于 2019-12-19 17:38:09

问题


Non greedy matching as far as I know is not part of Basic Regular Expression (BRE) and Extended Regular Expression (ERE). However, the behaviour on different versions of grep (BSD and GNU) seems to suggest other wise.

For example, let's take the following example. I have a string say:

string="hello_my_dear_polo"

Using GNU grep:

Following are few attempts to extract hello from the string.

BRE Attempt (fails):

$ grep -o "hel.*\?o" <<< "$string"
hello_my_dear_polo

Output yields entire string which suggest the non-greedy quantifier does not work on BRE. Note that I have only escaped ? since * does not lose it's meaning and need not be escaped.

ERE Attempt (fails):

$ grep -oE "hel.*?o" <<< "$string"
hello_my_dear_polo

Enabling the -E option also yields the same output suggesting that non-greedy matching is not part of ERE. Escaping was not needed here since we are using ERE.

PCRE Attempt (succeeds):

$ grep -oP "hel.*?o" <<< "$string"
hello

Enabling the -P option for PCRE suggests that non-greedy quantifier is a part of it and hence we get the desired output of hello. Escaping was not needed here since we are using PCRE.

Using BSD grep:

Here are few attempts to extract hello from the string.

BRE Attempt (fails):

$ grep -o "hel.*\?o" <<< "$string"

Using BRE I get no output from BSD grep.

ERE Attempt (succeeds):

$ grep -oE "hel.*?o" <<< "$string"
hello

After enabling the -E option, I am surprised that I was able to extract my desired output. My question is on the output I am getting from this attempt.

PCRE Attempt (fails):

$ grep -oP "hel.*?o" <<< "$string"
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]

Using -P option gave me usage error which was expected since BSD option of grep does not support PCRE.

So my question is why would using ERE on BSD grep yield correct output with using non-greedy quantifier but not with GNU grep.

Is this a bug, an un-documented feature of BSD egrep or my mis-understanding of the output?


回答1:


The double quantifier is simply a syntax error and could result in either an error message or undefined behavior. It would arguably be better if you got an error message.

Perl extensions to regex post-date POSIX by a large margin; at the time these tools were written, it was extremely unlikely that someone would try to use this wacky syntax for anything. Greedy matching was only introduced in Perl 5, in the mid-1990s.



来源:https://stackoverflow.com/questions/23454172/non-greedy-matching-with-grep

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!