This regular expression
/\\(.*\\)/
won\'t match the matching parenthesis but the last parenthesis in the string. Is there a regular expression
Given a string containing nested matching parentheses, you can either match the innermost sets with this (non-recursive JavaScript) regex:
var re = /\([^()]*\)/g;
Or you can match the outermost sets with this (recursive PHP) regex:
$re = '/\((?:[^()]++|(?R))*\)/';
But you cannot easily match sets of matching parentheses that are in-between the innermost and outermost.
Note also that the (naive and frequently encountered) expression: /\(.*?\)/
will always match incorrectly (neither the innermost nor outermost matched sets).
If you only have one level of parentheses, then there are two possibilities.
Option 1: use ungreedy repetition:
/\(.*?\)/
This will stop when it encounters the first )
.
Option 2: use a negative character class
/\([^)]*\)/
This can only repeat characters that are not )
, so it can necessarily never go past the first closing parenthesis. This option is usually preferred due to performance reasons. In addition, this option is more easily extended to allow for escaping parenthesis (so that you could match this complete string: (some\)thing)
instead of throwing away thing)
). But this is probably rather rarely necessary.
However if you want nested structures, this is generally too complicated for regex (although some flavors like PCRE support recursive patterns). In this case, you should just go through the string yourself and count parentheses, to keep track of your current nesting level.
Just as a side note about those recursive patterns: In PCRE (?R)
simply represents the whole pattern, so inserting this somewhere makes the whole thing recursive. But then every content of parentheses must be of the same structure as the whole match. Also, it is not really possible to do meaningful one-step replacements with this, as well as using capturing groups on multiple nested levels. All in all - you are best off, not to use regular expressions for nested structures.
Update: Since you seem eager to find a regex solution, here is how you would match your example using PCRE (example implementation in PHP):
$str = 'there are (many (things (on) the)) box (except (carrots (and apples)))';
preg_match_all('/\([^()]*(?:(?R)[^()]*)*\)/', $str, $matches);
print_r($matches);
results in
Array
(
[0] => Array
(
[0] => (many (things (on) the))
[1] => (except (carrots (and apples)))
)
)
What the pattern does:
\( # opening bracket
[^()]* # arbitrarily many non-bracket characters
(?: # start a non-capturing group for later repetition
(?R) # recursion! (match any nested brackets)
[^()]* # arbitrarily many non-bracket characters
)* # close the group and repeat it arbitrarily many times
\) # closing bracket
This allows for infinite nested levels and also for infinite parallel levels.
Note that it is not possible to get all nested levels as separate captured groups. You will always just get the inner-most or outer-most group. Also, doing a recursive replacement is not possible like this.
Regular expressions are not powerful enough to find matching parentheses, because parentheses are nested structures. There exists a simple algorithm to find matching parentheses, though, which is described in this answer.
If you are just trying to find the first right parenthesis in an expression, you should use a non-greedy matcher in your regex. In this case, the non-greedy version of your regex is the following:
/\(.*?\)/