Regular Expression named capturing groups support in Java 7

谁说胖子不能爱 提交于 2019-12-18 12:49:45

问题


Since Java 7 regular expressions API offers support for named capturing groups. The method java.util.regex.Matcher.group(String) returns the input subsequence captured by the given named-capturing group, but there's no example available on API documentations.

What is the right syntax to specify and retrieve a named capturing group in Java 7?


回答1:


Specifying named capturing group

Use the following regex with a single capturing group as an example ([Pp]attern).

Below are 4 examples on how to specify a named capturing group for the regex above:

(?<Name>[Pp]attern)
(?<group1>[Pp]attern)
(?<name>[Pp]attern)
(?<NAME>[Pp]attern)

Note that the name of the capturing group must strictly matches the following Pattern:

[A-Za-z][A-Za-z0-9]*

The group name is case-sensitive, so you must specify the exact group name when you are referring to them (see below).

Backreference the named capturing group in regex

To back-reference the content matched by a named capturing group in the regex (correspond to 4 examples above):

\k<Name>
\k<group1>
\k<name>
\k<NAME>

The named capturing group is still numbered, so in all 4 examples, it can be back-referenced with \1 as per normal.

Refer to named capturing group in replacement string

To refer to the capturing group in replacement string (correspond to 4 examples above):

${Name}
${group1}
${name}
${NAME}

Same as above, in all 4 examples, the content of the capturing group can be referred to with $1 in the replacement string.

Named capturing group in COMMENT mode

Using (?<name>[Pp]attern) as an example for this section.

Oracle's implementation of the COMMENT mode (embedded flag (?x)) parses the following examples to be identical to the regex above:

(?x)  (  ?<name>             [Pp] attern  )
(?x)  (  ?<  name  >         [Pp] attern  )
(?x)  (  ?<  n  a m    e  >  [Pp] attern  )

Except for ?< which must not be separated, it allows arbitrary spacing even in between the name of the capturing group.

Same name for different capturing groups?

While it is possible in .NET, Perl and PCRE to define the same name for different capturing groups, it is currently not supported in Java (Java 8). You can't use the same name for different capturing groups.

Named capturing group related APIs

New methods in Matcher class to support retrieving captured text by group name:

  • group(String name) (from Java 7)
  • start(String name) (from Java 8)
  • end(String name) (from Java 8)

The corresponding method is missing from MatchResult class as of Java 8. There is an on-going Enhancement request JDK-8065554 for this issue.

There is currently no API to get the list of named capturing groups in the regex. We have to jump through extra hoops to get it. Though it is quite useless for most purposes, except for writing a regex tester.




回答2:


The new syntax for a named capturing group is (?<name>X) for a matching group X named by "name". The following code captures the regex (\w+) (any group of alphanumeric characters). To name this capturing group you must add the expression ? inside the parentheses just before the regex to be captured.

Pattern compile = Pattern.compile("(?<teste>\\w+)");
Matcher matcher = compile.matcher("The first word is a match");
matcher.find();
String myNamedGroup= matcher.group("teste");
System.out.printf("This is yout named group: %s", myNamedGroup);

This code returns prints the following output:

This is your named group: The



来源:https://stackoverflow.com/questions/27498106/regular-expression-named-capturing-groups-support-in-java-7

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!