Python re.finditer match.groups() does not contain all groups from match

后端 未结 1 958
一个人的身影
一个人的身影 2020-12-21 12:59

I am trying to use regex in Python to find and print all matching lines from a multiline search. The text that I am searching through may have the below example structure:

相关标签:
1条回答
  • 2020-12-21 13:18

    Here is your regular expression:

    (AAA\r\n)(ABC[0-9]\r\n){1,}
    

    Regular expression visualization

    Debuggex Demo

    Your goal is to capture all ABC#s that immediately follow AAA. As you can see in this Debuggex demo, all ABC#s are indeed being matched (they're highlighted in yellow). However, since only the "what is being repeated" part

    ABC[0-9]\r\n
    

    is being captured (is inside the parentheses), and its quantifier,

    {1,}
    

    is not being captured, this therefore causes all matches except the final one to be discarded. To get them, you must also capture the quantifier:

    AAA\r\n((?:ABC[0-9]\r\n){1,})
    

    Regular expression visualization

    Debuggex Demo

    I've placed the "what is being repeated" part (ABC[0-9]\r\n) into a non-capturing group. (I've also stopped capturing AAA, as you don't seem to need it.)

    The captured text can be split on the newline, and will give you all the pieces as you wish.

    (Note that \n by itself doesn't work in Debuggex. It requires \r\n.)


    This is a workaround. Not many regular expression flavors offer the capability of iterating through repeating captures (which ones...?). A more normal approach is to loop through and process each match as they are found. Here's an example from Java:

       import java.util.regex.*;
    
    public class RepeatingCaptureGroupsDemo {
       public static void main(String[] args) {
          String input = "I have a cat, but I like my dog better.";
    
          Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)");
          Matcher m = p.matcher(input);
    
          while (m.find()) {
             System.out.println(m.group());
          }
       }
    }
    

    Output:

    cat
    dog
    

    (From http://ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-1/, about a 1/4 down)


    Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. The links in this answer come from it.

    0 讨论(0)
提交回复
热议问题