Why are regex capturing groups indexed at one?

前端 未结 1 1075
挽巷
挽巷 2021-01-21 15:07

Part of me is worries that this question will get closed, but I\'m genuinely baffled by something. In every language\'s regex that I\'ve used, the capturing groups are indexed a

相关标签:
1条回答
  • 2021-01-21 15:48

    In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero.

    I guess, by rest of the language you mean, arrays and other container types. Well, in regex, capture groups do start with 0, but it is not obvious at first.

    The capture group 0, contains the complete match, and the capture groups thereon, are the groups that you can see as created using parenthesis - ().

    So, in the below regex, for string - "ab123cd":

    ab(\d+)cd
    

    There are really two groups:

    • Group 0 - Is complete match - ab123cd
    • Group 1 - Is the group you captured using () - 123

    There on, the groups are numbered in the order of occurrence of opening parenthesis (.

    So, for the below regex (Whitespaces added to readability):

    ab(    x   (\d+))cd
      ^        ^
      |        |
     group 1  group 2
    

    When applying the above regex to string - "abx123cd", you will have following groups:

    • Group 0 - Complete match - abcx123cd
    • Group 1 - Pattern in first opening parenthesis - x123
    • Group 2 - Pattern in 2nd opening parenthesis - 123

    When you map those regex in Java, you can get all those groups using the following methods:

    • Matcher.group() to get group 0 (Note, there are no parameters), and
    • Matcher.group(int) to get rest of the groups (Note an int parameter, taking value for respective group)
    0 讨论(0)
提交回复
热议问题