问题
As an introductory note, I am aware of the old saying about solving problems with regex and I am also aware about the precautions on processing XML with RegEx. But please bear with me for a moment...
I am trying to do a RegEx search and replace on a group of characters. I don't know in advance how often this group will be matched, but I want to search with a certain context only.
An example:
If I have the following string "**ab**df**ab**sdf**ab**fdsa**ab**bb"
and I want to search for "ab"
and replace with "@ab@"
, this works fine using the following regex:
Search regex:
(.*?)(ab)(.*?)
Replace:
$1@$2@$3
I get four matches in total, as expected. Within each match, the group IDs are the same, so the back-references ($1, $2 ...) work fine, too.
However, if I now add a certain context to the string, the regex above fails:
Search string:
<context>abdfabsdfabfdsaabbb</context>
Search regex:
<context>(.*?)(ab)(.*?)</context>
This will find only the first match.
But even if I add a non-capturing group to the original regex, it doesn't work ("<context>(?:(.*?)(ab)(.*?))*</context>"
).
What I would like is a list of matches as in the first search (without the context), whereby within each match the group IDs are the same.
Any idea how this could be achieved?
回答1:
Solution
Your requirement is similar to the one in this question: match and capture multiple instances of a pattern between a prefix and a suffix. Using the method as described in this answer of mine:
(?s)(?:<context>|(?!^)\G)(?:(?!</context>|ab).)*ab
Add capturing group as you need.
Caveat
Note that the regex only works for tags that are only allowed to contain only text. If a tag contains other tags, then it won't work correctly.
It also matches ab
inside <context>
tag without a closing tag </context>
. If you want to prevent this then:
(?s)(?:<context>(?=.*?</context>)|(?!^)\G)(?:(?!</context>|ab).)*ab
Explanation
Let us break down the regex:
(?s) # Make . matches any character, without exception
(?:
<context>
|
(?!^)\G
)
(?:(?!</context>|ab).)*
ab
(?:<context>|(?!^)\G)
makes sure that we either gets inside a new <context>
tag, or continue from the previous match and attempt to match more instance of sub-pattern.
(?:(?!</context>|ab).)*
match whatever text that we don't care about (not ab
) and prevent us from going past the closing tag </context>
. Then we match the pattern we want ab
at the end.
来源:https://stackoverflow.com/questions/21428545/java-regex-how-to-back-reference-capturing-groups-in-a-certain-context-when-the