How does this PCRE pattern detect palindromes?

耗尽温柔 提交于 2019-11-27 21:16:46

Let's try to understand the regex by constructing it. Firstly, a palindrome must start and end with the same sequence of character in the opposite direction:

^(.)(.)(.) ... \3\2\1$

we want to rewrite this such that the ... is only followed by a finite length of patterns, so that it could be possible for us to convert it into a *. This is possible with a lookahead:

^(.)(?=.*\1$)
 (.)(?=.*\2\1$)
 (.)(?=.*\3\2\1$) ...

but there are still uncommon parts. What if we can "record" the previously captured groups? If it is possible we could rewrite it as:

^(.)(?=.*(?<record>\1\k<record>)$)   # \1     = \1 + (empty)
 (.)(?=.*(?<record>\2\k<record>)$)   # \2\1   = \2 + \1
 (.)(?=.*(?<record>\3\k<record>)$)   # \3\2\1 = \3 + \2\1
 ...

which could be converted into

^(?: 
    (.)(?=.*(\1\2)$)
 )*

Almost good, except that \2 (the recorded capture) is not empty initially. It will just fail to match anything. We need it to match empty if the recorded capture doesn't exist. This is how the conditional expression creeps in.

(?(2)\2|)   # matches \2 if it exist, empty otherwise.

so our expression becomes

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*

Now it matches the first half of the palindrome. How about the 2nd half? Well, after the 1st half is matched, the recorded capture \2 will contain the 2nd half. So let's just put it in the end.

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*\2$

We want to take care of odd-length palindrome as well. There would be a free character between the 1st and 2nd half.

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*.?\2$

This works good except in one case — when there is only 1 character. This is again due to \2 matches nothing. So

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*.?\2?$
#      ^ since \2 must be at the end in the look-ahead anyway.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!