Part of me is worries that this question will get closed, but I\'m genuinely baffled by something. In every language\'s regex that I\'ve used, the capturing groups are indexed a
In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero.
I guess, by rest of the language you mean, arrays and other container types. Well, in regex, capture groups do start with 0
, but it is not obvious at first.
The capture group 0, contains the complete match, and the capture groups thereon, are the groups that you can see as created using parenthesis - ()
.
So, in the below regex, for string - "ab123cd"
:
ab(\d+)cd
There are really two groups:
ab123cd
()
- 123
There on, the groups are numbered in the order of occurrence of opening parenthesis (
.
So, for the below regex (Whitespaces added to readability):
ab( x (\d+))cd
^ ^
| |
group 1 group 2
When applying the above regex to string - "abx123cd"
, you will have following groups:
abcx123cd
x123
123
When you map those regex in Java
, you can get all those groups using the following methods:
int
parameter, taking value for respective group)