Capture group multiple times

纵饮孤独 提交于 2019-12-10 17:03:49

问题


Lately I have being playing around with regex in Java, and I find myself into a problem which (theoretically) is easy to solve, but I was wandering if there is any easier way to do it (Yes, yes I am lazy), the problem is capture a group multiple times, this is:

public static void main(String[] args) {
    Pattern p = Pattern.compile("A (IvI(.*?)IvI)*? A");
    Matcher m = p.matcher("A IvI asd IvI IvI qwe IvI A"); //ANY NUMBER of IvI x IvI
    //Matcher m = p.matcher("A  A");
    int loi = 0; //last Occurrence Index
    String storage;
    while (loi >= 0 && m.find(loi)) {
        System.out.println(m.group(1));
        if ((storage = m.group(2)) != null) {
            System.out.println(storage);
        }
        //System.out.println(m.group(1));
        loi = m.end(1);
    }
    m.find();
    System.out.println("2 opt");
    Pattern p2 = Pattern.compile("IvI(.*?)IvI");
    Matcher m2 = p2.matcher(m.group(1)); //m.group(1) = "IvI asd IvI IvI qwe IvI"
    loi = 0;
    while (loi >= 0 && m2.find(loi)) {
        if ((storage = m2.group(1)) != null) {
            System.out.println(storage);
        }
        loi = m2.end(0);
    }
}

Using ONLY Pattern p is there any way to get what is inside IvI's? (in the test string would be "asd" and "qwe") considering that there could be any number of IvI's sections, something alike of what I am trying to do in the first while which is, finding the first occurrence of the group, then moving the index and search for the next group and so on and so on...

Using the code I wrote in that while it returns asd IvI IvI qwe as the group 2, not just asd and then qwe, in part I suppose it could be because of the (.*?) part, is is not supposed to be greedy but still it goes up to the qwe consuming two of the IvI's, I mention this because otherwise I may be able to use the end index of those with the matcher.find(anInt) method, but it does not work either; I don't think it is anything wrong with the regex, since the next code works without consuming the IvI.

public static void main(String[] args) {
    Pattern p = Pattern.compile("(.*?)IvI");
    Matcher m = p.matcher("bla bla blaIvI");
    m.find();
    System.out.println(m.group(1));
}

This prints: bla bla bla

THERE IS A SOLUTION I KNOW (but I am lazy remember)

(Also on the first code, bellow "2 opt" message) The solution is dividing it into sub-groups and use another regex where you process only those sub-groups one at a time...

BTW: I did my homework In this page it mentions

Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z])+, when you inspect the match, Group 1 will be D. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

But I am still hoping you to give me good news...


回答1:


No, unfortunately, as your citation already mentions, the java.util.regex regular expression implementation does not support retrieving any previous values of a repeated capturing group after a single match. The only way to get those, as your code illustrates, is by find()ing multiple matches of the repeated part of your regular expression.

I've also been looking at other implementations of regular expressions in Java, for example:

  • http://www.brics.dk/automaton/

but I could not find any that supported it (only the Microsoft .NET engine) . If I understood correctly, implementations of regular expressions based on state machines cannot easily implement this feature. java.util.regex does not use state machines, though.

If anyone knows of a Java regular expression library that supports this behaviour, please share it, because it would be a powerful feature.

p.s. it took me quite a while to understand your question. The title is good, but the body confused me about whether I understood you correctly.



来源:https://stackoverflow.com/questions/26773829/capture-group-multiple-times

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!