问题
I'm using C# and regex, trying capture outer paren groups while ignoring inner paren groups. I have legacy-generated text files containing thousands of string constructions like the following:
([txtData] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star3])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName txtData)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
([txtUI] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star2])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName ApplicationWindow)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
([star38] of COMPOSITE
(dirty FALSE)
(composite [txtUI])
(view [star39])
(creationIndex 26)
(creationProps composite [txtUI] sName Bestellblatt)
(instanceNameSpecified TRUE)
(containsObject COMPOSITE)
(sName Bestellblatt)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
I am looking for a regex that will capture the 3 groupings in the example above, and here is what I have tried so far:
Regex regex = new Regex(@"\((.*?)\)");
return regex.Matches(str);
The problem with the regex above is that it finds inner paren groupings such as dirty FALSE
and composite [txtModel]
. But what I want it to match is each of the outer groupings, such as the 3 shown above. The definition of an outer grouping is simple:
- Opening paren is either the first character in the file, or it follows a line feed and/or carriage return.
- Closing paren is either the last character in the file, or it is followed by a line feed or carriage return.
I want the regex pattern to ignore all paren-groupings that don't obey numbers 1 and 2 above. By "ignore" I mean that they shouldn't be seen as a match - but they should be returned as part of the outer grouping match.
So, for my objective to be met, when my C# regex runs against the example above, I should get back a regex MatchCollection
with exactly 3 matches, just as shown above.
How is it done? (Thanks in advance.)
回答1:
You can achieve it via Balancing Groups.
Here is a demo to match outer brackets.
string sentence = @"([txtData] of COM ..."; // your text
string pattern = @"\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)";
Regex rgx = new Regex(pattern);
foreach (Match match in rgx.Matches(sentence))
{
Console.WriteLine(match.Value);
Console.WriteLine("--------");
}
来源:https://stackoverflow.com/questions/63024714/capture-outer-paren-groups-while-ignoring-inner-paren-groups