Find all matches with an extra qualifying criteria

后端 未结 1 1608
我寻月下人不归
我寻月下人不归 2021-01-20 07:14

Given sentences such as;

Boy has a dog and a cat.
Boy microwaves a gerbil.
Sally owns a cat.

For each sentence I want a list of animals (de

相关标签:
1条回答
  • 2021-01-20 07:36

    You may use a positive lookbehind:

    (?<=^Boy.*?)(?:dog|cat|gerbil)
    

    Or, a variation with word boundaries to match the animals as whole words:

    (?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b
    

    See the regex demo

    The (?<=^Boy.*?) positive lookbehind will require the Boy at the start of the string for the consuming pattern to match.

    If your input contains LF (newline) chars, pass the RegexOptions.Singleline option for . to match newlines, too.

    C# usage:

    var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();
    

    C# demo:

    var strs = new List<string>() { "Boy has a dog and a cat.", 
            "Boy something a gerbil.",
            "Sally owns a cat." };
    foreach (var s in strs)
    {
        var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToList();
         if (results.Count > 0) {
            Console.WriteLine("{0}:\n[{1}]\n------", s, string.Join(", ", results));
         }
         else
         {
            Console.WriteLine("{0}:\nNO MATCH!\n------", s);
         }
    }
    

    Output:

    Boy has a dog and a cat.:
    [dog, cat]
    ------
    Boy something a gerbil.:
    [gerbil]
    ------
    Sally owns a cat.:
    NO MATCH!
    ------
    

    There is an alternative: match any string starting with Boy and then after each successful match only:

    (?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b
    

    See this regex demo (or a regex101 link here)

    You would just need to grab Group 1 contents:

    var results = Regex.Matches(s, @"(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b")
                .Cast<Match>()
                .Select(m => m.Groups[1].Value)
                .ToList();
    

    See this C# demo.

    Here,

    • (?:\G(?!\A)|^Boy\b) - either the end of the precvious match (\G(?!\A)) or the start of the string followed with the whole word Boy
    • .*? - any 0+ chars other than a newline (if no RegexOptions.Singleline is passed to the Regex constructor) as few as possible
    • \b(dog|cat|gerbil)\b - a whole word dog, cat or gerbil

    Bascially, these regexps are similar, although \G based regex might turn out a bit faster.

    0 讨论(0)
提交回复
热议问题