This regular expression
/\\(.*\\)/
won\'t match the matching parenthesis but the last parenthesis in the string. Is there a regular expression
If you only have one level of parentheses, then there are two possibilities.
Option 1: use ungreedy repetition:
/\(.*?\)/
This will stop when it encounters the first )
.
Option 2: use a negative character class
/\([^)]*\)/
This can only repeat characters that are not )
, so it can necessarily never go past the first closing parenthesis. This option is usually preferred due to performance reasons. In addition, this option is more easily extended to allow for escaping parenthesis (so that you could match this complete string: (some\)thing)
instead of throwing away thing)
). But this is probably rather rarely necessary.
However if you want nested structures, this is generally too complicated for regex (although some flavors like PCRE support recursive patterns). In this case, you should just go through the string yourself and count parentheses, to keep track of your current nesting level.
Just as a side note about those recursive patterns: In PCRE (?R)
simply represents the whole pattern, so inserting this somewhere makes the whole thing recursive. But then every content of parentheses must be of the same structure as the whole match. Also, it is not really possible to do meaningful one-step replacements with this, as well as using capturing groups on multiple nested levels. All in all - you are best off, not to use regular expressions for nested structures.
Update: Since you seem eager to find a regex solution, here is how you would match your example using PCRE (example implementation in PHP):
$str = 'there are (many (things (on) the)) box (except (carrots (and apples)))';
preg_match_all('/\([^()]*(?:(?R)[^()]*)*\)/', $str, $matches);
print_r($matches);
results in
Array
(
[0] => Array
(
[0] => (many (things (on) the))
[1] => (except (carrots (and apples)))
)
)
What the pattern does:
\( # opening bracket
[^()]* # arbitrarily many non-bracket characters
(?: # start a non-capturing group for later repetition
(?R) # recursion! (match any nested brackets)
[^()]* # arbitrarily many non-bracket characters
)* # close the group and repeat it arbitrarily many times
\) # closing bracket
This allows for infinite nested levels and also for infinite parallel levels.
Note that it is not possible to get all nested levels as separate captured groups. You will always just get the inner-most or outer-most group. Also, doing a recursive replacement is not possible like this.