Regex is behaving lazy, should be greedy

冷暖自知 提交于 2019-12-05 17:44:05

问题


I thought that by default my Regex would exhibit the greedy behavior that I want, but it is not in the following code:

 Regex keywords = new Regex(@"in|int|into|internal|interface");
 var targets = keywords.ToString().Split('|');
 foreach (string t in targets)
    {
    Match match = keywords.Match(t);
    Console.WriteLine("Matched {0,-9} with {1}", t, match.Value);
    }

Output:

Matched in        with in
Matched int       with in
Matched into      with in
Matched internal  with in
Matched interface with in

Now I realize that I could get it to work for this small example if I simply sorted the keywords by length descending, but

  • I want to understand why this isn't working as expected, and
  • the actual project I am working on has many more words in the Regex and it is important to keep them in alphabetical order.

So my question is: Why is this being lazy and how do I fix it?


回答1:


Laziness and greediness applies to quantifiers only (?, *, +, {min,max}). Alternations always match in order and try the first possible match.




回答2:


It looks like you're trying to word break things. To do that you need the entire expression to be correct, your current one is not. Try this one instead..

new Regex(@"\b(in|int|into|internal|interface)\b");

The "\b" says to match word boundaries, and is a zero-width match. This is locale dependent behavior, but in general this means whitespace and punctuation. Being a zero width match it will not contain the character that caused the regex engine to detect the word boundary.




回答3:


According to RegularExpressions.info, regular expressions are eager. Therefore, when it goes through your piped expression, it stops on the first solid match.

My recommendation would be to store all of your keywords in an array or list, then generate the sorted, piped expression when you need it. You would only have to do this once too as long as your keyword list doesn't change. Just store the generated expression in a singleton of some sort and return that on regex executions.



来源:https://stackoverflow.com/questions/2394931/regex-is-behaving-lazy-should-be-greedy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!