I am trying to use regex in Python to find and print all matching lines from a multiline search. The text that I am searching through may have the below example structure:
Here is your regular expression:
(AAA\r\n)(ABC[0-9]\r\n){1,}
Debuggex Demo
Your goal is to capture all ABC#
s that immediately follow AAA
. As you can see in this Debuggex demo, all ABC#
s are indeed being matched (they're highlighted in yellow). However, since only the "what is being repeated" part
ABC[0-9]\r\n
is being captured (is inside the parentheses), and its quantifier,
{1,}
is not being captured, this therefore causes all matches except the final one to be discarded. To get them, you must also capture the quantifier:
AAA\r\n((?:ABC[0-9]\r\n){1,})
Debuggex Demo
I've placed the "what is being repeated" part (ABC[0-9]\r\n
) into a non-capturing group. (I've also stopped capturing AAA
, as you don't seem to need it.)
The captured text can be split on the newline, and will give you all the pieces as you wish.
(Note that \n
by itself doesn't work in Debuggex. It requires \r\n.)
This is a workaround. Not many regular expression flavors offer the capability of iterating through repeating captures (which ones...?). A more normal approach is to loop through and process each match as they are found. Here's an example from Java:
import java.util.regex.*;
public class RepeatingCaptureGroupsDemo {
public static void main(String[] args) {
String input = "I have a cat, but I like my dog better.";
Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group());
}
}
}
Output:
cat
dog
(From http://ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-1/, about a 1/4 down)
Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. The links in this answer come from it.