Greedy vs. Reluctant vs. Possessive Quantifiers

前端 未结 7 1064
傲寒
傲寒 2020-11-21 07:19

I found this excellent tutorial on regular expressions and while I intuitively understand what \"greedy\", \"reluctant\" and \"possessive\" quantifiers do, there seems to be

7条回答
  •  灰色年华
    2020-11-21 07:47

    http://swtch.com/~rsc/regexp/regexp1.html

    I'm not sure that's the best explanation on the internet, but it's reasonably well written and appropriately detailed, and I keep coming back to it. You might want to check it out.

    If you want a higher-level (less detailed explanation), for simple regular expressions such as the one you're looking at, a regular expression engine works by backtracking. Essentially, it chooses ("eats") a section of the string and tries to match the regular expression against that section. If it matches, great. If not, the engine alters its choice of the section of the string and tries to match the regexp against that section, and so on, until it's tried every possible choice.

    This process is used recursively: in its attempt to match a string with a given regular expression, the engine will split the regular expression into pieces and apply the algorithm to each piece individually.

    The difference between greedy, reluctant, and possessive quantifiers enters when the engine is making its choices of what part of the string to try to match against, and how to modify that choice if it doesn't work the first time. The rules are as follows:

    • A greedy quantifier tells the engine to start with the entire string (or at least, all of it that hasn't already been matched by previous parts of the regular expression) and check whether it matches the regexp. If so, great; the engine can continue with the rest of the regexp. If not, it tries again, but trimming one character (the last one) off the section of the string to be checked. If that doesn't work, it trims off another character, etc. So a greedy quantifier checks possible matches in order from longest to shortest.

    • A reluctant quantifier tells the engine to start with the shortest possible piece of the string. If it matches, the engine can continue; if not, it adds one character to the section of the string being checked and tries that, and so on until it finds a match or the entire string has been used up. So a reluctant quantifier checks possible matches in order from shortest to longest.

    • A possessive quantifier is like a greedy quantifier on the first attempt: it tells the engine to start by checking the entire string. The difference is that if it doesn't work, the possessive quantifier reports that the match failed right then and there. The engine doesn't change the section of the string being looked at, and it doesn't make any more attempts.

    This is why the possessive quantifier match fails in your example: the .*+ gets checked against the entire string, which it matches, but then the engine goes on to look for additional characters foo after that - but of course it doesn't find them, because you're already at the end of the string. If it were a greedy quantifier, it would backtrack and try making the .* only match up to the next-to-last character, then up to the third to last character, then up to the fourth to last character, which succeeds because only then is there a foo left after the .* has "eaten" the earlier part of the string.

提交回复
热议问题