问题
Consider the following regex:
(([^\|])*\|)*([^\|]*)
This matches repetitive string patterns of the type
("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}
So it should match the following String, which has 17 substrings (16 repeated, plus " z" as the last one).
"abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z"
Indeed, RegexPal verifies that the given regex does match the above string.
Now, I want to get each of the substrings (i.e., "abcd |", "e |", "fg |", etc.), for which there is no prior knowledge about their number, length etc.
According to a similarly-titled previous StackOverflow post and the documentation of the Matcher
class find()
method, I just need to do something like
Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string
while (matcher.find())
{
System.out.println(matcher.group(1));
}
However, when I do this I just get 2 strings printed out: the last repeated substring ("x y|") and a null value; definitely not the 16 substrings I expect.
A nice thing would also be to check that a match has actually happened, before running the find()
loop, but I am not sure whether matches()
, groupCount() > 0
, or some other condition should be used, without doing twice the matching work, given that find()
also does matching.
So, questions:
- How can I get all the 16 repeated substrings?
- How can I get the last substring?
- How do I check that the string matched?
回答1:
If you must use the regular expression...
1) How can I get all the 16 repeated substrings?
See below. When cycling over for matches, you don't need everything to match, just the section you want. (I get 17 matches--is this correct?)
2) How can I get the last substring?
Switching the delim to the start of the regex and also allowing '^'.
3) How do I check that the string matched?
What qualifies for a non-match? Any string will match.
Here is a solution using regular expressions:
String input = "abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z";
int expectedSize = 17;
List<String> expected = new ArrayList<String>(Arrays.asList("abcd ", " e ", " fg ", " hijk ", " lmnop ", " ", " ", " qrs ", " t", " uv", " w ", "",
"", "", "", "x y", " z"));
List<String> matches = new ArrayList<String>();
// Pattern pattern = Pattern.compile("(?:\\||^)([^\\|]*)");
Pattern pattern = Pattern.compile("(?:_?\\||^)([^\\|]*?)(?=_?\\||$)"); // Edit: allows _| or | as delim
for (Matcher matcher = pattern.matcher(input); matcher.find();)
{
matches.add(matcher.group(1));
}
for (int idx = 0, len = matches.size(); idx < len; idx++)
{
System.out.format("[%-2d] \"%s\"%n", idx + 1, matches.get(idx));
}
assertSame(expectedSize, matches.size());
assertEquals(expected, matches);
Output
[1 ] "abcd "
[2 ] " e "
[3 ] " fg "
[4 ] " hijk "
[5 ] " lmnop "
[6 ] " "
[7 ] " "
[8 ] " qrs "
[9 ] " t"
[10] " uv"
[11] " w "
[12] ""
[13] ""
[14] ""
[15] ""
[16] "x y"
[17] " z"
回答2:
I'm afraid you're confusing things. Whenever you use repetitions ('*', '+', etc.), you can't get all the instances matched. Using something like ((xxx)*)
you can get the whole string matched as group(1)
and the last part matched as group(2)
, nothing else.
Consider using String.split
or better Guava's Splitter.
Ad 1. You can't. Use a simple pattern like
\G([^\|])*(\||$)
together with find()
to get all the matches in sequence. Note the \G
anchoring to a previous match.
Ad 2. How can I get the last substring?
As the last result find
returns.
Ad 3. How do I check that the string matched?
After your last find
check if matcher.end() == input.length
. But with this pattern you don't need to check anything, as it always matches.
来源:https://stackoverflow.com/questions/7698499/java-repetitive-pattern-matching-2