Given sentences such as;
Boy has a dog and a cat.
Boy microwaves a gerbil.
Sally owns a cat.
For each sentence I want a list of animals (de
You may use a positive lookbehind:
(?<=^Boy.*?)(?:dog|cat|gerbil)
Or, a variation with word boundaries to match the animals as whole words:
(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b
See the regex demo
The (?<=^Boy.*?)
positive lookbehind will require the Boy
at the start of the string for the consuming pattern to match.
If your input contains LF (newline) chars, pass the RegexOptions.Singleline
option for .
to match newlines, too.
C# usage:
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast()
.Select(m => m.Value)
.ToList();
C# demo:
var strs = new List() { "Boy has a dog and a cat.",
"Boy something a gerbil.",
"Sally owns a cat." };
foreach (var s in strs)
{
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast()
.Select(m => m.Value)
.ToList();
if (results.Count > 0) {
Console.WriteLine("{0}:\n[{1}]\n------", s, string.Join(", ", results));
}
else
{
Console.WriteLine("{0}:\nNO MATCH!\n------", s);
}
}
Output:
Boy has a dog and a cat.:
[dog, cat]
------
Boy something a gerbil.:
[gerbil]
------
Sally owns a cat.:
NO MATCH!
------
There is an alternative: match any string starting with Boy
and then after each successful match only:
(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b
See this regex demo (or a regex101 link here)
You would just need to grab Group 1 contents:
var results = Regex.Matches(s, @"(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b")
.Cast()
.Select(m => m.Groups[1].Value)
.ToList();
See this C# demo.
Here,
(?:\G(?!\A)|^Boy\b)
- either the end of the precvious match (\G(?!\A)
) or the start of the string followed with the whole word Boy
.*?
- any 0+ chars other than a newline (if no RegexOptions.Singleline
is passed to the Regex
constructor) as few as possible\b(dog|cat|gerbil)\b
- a whole word dog
, cat
or gerbil
Bascially, these regexps are similar, although \G
based regex might turn out a bit faster.