问题
Vim help says that:
\1 Matches the same string that was matched by */\1* *E65* the first sub-expression in \( and \). {not in Vi} Example: "\([a-z]\).\1" matches "ata", "ehe", "tot", etc.
It looks like the backreference can be used in search pattern. I started playing with it and I noticed behavior that I can't explain. This is my file:
<paper-input label="Input label"> Some text </paper-input>
<paper-input label="Input label"> Some text </paper-inputa>
<aza> Some text </az>
<az> Some text </az>
<az> Some text </aza>
I wanted to match the lines where the opening and closing tags are matching i.e.:
<paper-input label="Input label"> Some text </paper-input>
<az> Some text </az>
And my test regex is:
%s,<\([^ >]\+\).*<\/\1>,,gn
But this matches lines: 1
, 3
and 4
. Same thing with sed:
$ sed -ne 's,<\([^ >]\+\).*<\/\1>,\0,p' file
<paper-input label="Input label"> Some text </paper-input>
<aza> Some text </az>
<az> Some text </az>
This: <\([^ >]\+\)
should be greedy and when trying to match it without \1
at the end then all the groups are correct. But when I add \1
it seems that <\([^ >]\+\)
becomes not greedy and it tries to force the match in 3rd line. Can someone explain why it matches 3rd
line:
<aza> Some text </az>
This is also a regex101 demo
NOTE This is not about the regex itself (probably there is other way to do it) but about the behavior of that regex.
回答1:
To understand why your regex behaves the way it does you need to understand what a backtracking regex engine does.
The engine will greedily match and consume as many characters as it can. But if it doesn't find a match it goes back and tries to find a different match that still satisfies the pattern.
%s,<\([^ >]\+\).*<\/\1>,,gn
For line three <aza> Some text </az>
,
The regex engine looks at \1 = aza
. and sees if .*</aza>
matches the rest of the string. It doesn't so it chooses something else for \1
. The next time it chooses \1 = az
and sees if .*</az>
matches the rest of the string and it does. So the string matches
(This is a simplified version. I skipped over the fact that .*
can potentially do a lot of backtracking itself)
Solving it is as easy as adding an anchor in the regex stops the regex from searching for other values that could satisfy \1
. In this case matching a space or >
is sufficient.
回答2:
You need to add \>
to indicate end of word. There may be other solutions with 0-width patterns, but it'll complicates things.
Also, your separator is ,
, not /
Which gives:
%s,<\([^ >]\+\)\>.*</\1>,,gn
回答3:
Currently the reason why line 3 (<aza>
) is showing up as a match is that the .*
term in your regex can match across multiple lines. So line 3 matches because line 5 has the closing tag. To correct this, force the regex to find a matching closing tag on the same line only:
%s,<\([^ >]\+\)[^\n]*?<\/\1>,,gn
^^^^^ use [^\n]* instead of .*
来源:https://stackoverflow.com/questions/39380964/vim-sed-regex-backreference-in-search-pattern