Longest palindrome in a string using suffix tree

我怕爱的太早我们不能终老 提交于 2019-11-28 02:49:11

I believe you need to proceed this way:

Let y1y2 ... yn be your string (where yi are letters).

Create the generalized suffix tree of Sf = y1y2 ... yn$ and Sr = ynyn - 1 ... y1# (reverse the letters and choose different ending characters for Sf ($) and Sr (#))... where Sf stands for "String, Forward" and Sr stands for "String, Reverse".

For every suffix i in Sf, find the lowest common ancestor with the suffix n - i + 1 in Sr.

What runs from the root till this lowest common ancestor is a palindrome, because now the lowest common ancestor represents the longest common prefix of these two suffixes. Recall that:

(1) A prefix of a suffix is a substring.

(2) A palindrome is a string identical to its reverse.

(3) So the longest contained palindrome within a string is exactly the longest common substring of this string and its reverse.

(4) Thus, the longest contained palindrome within a string is exactly the longest common prefix of all pairs of suffixes between a string and its reverse. This is what we're doing here.

EXAMPLE

Let's take the word banana.

Sf = banana$

Sr = ananab#

Below is the generalised suffix tree of Sf and Sr, where the number at the end of each path is the index of the corresponding suffix. There's a small mistake, the a common to all the 3 branches of Blue_4's parent should be on its entering edge, beside n:

The lowest interior node in the tree is the longest common substring of this string and its reverse. Looking at all the interior nodes in the tree you will therefore find the longest palindrome.

The longest palindrome is found between between Green_0 and Blue_1 (i.e., banana and anana) and is anana


EDIT

I've just found this paper that answers this question.

The Linear solution can be found in this Way ::

Prequisities:

(1).You must know how to construct the suffix array in O(N) or O(NlogN) time.

(2).You must know how to find the standard LCP Array ie. LCP between adjacent Suffixes i and i-1

ie . LCP [i]=LCP(suffix i in sorted array, suffix i-1 in sorted array) for (i>0).

Let S be the Original String and S' be the reverse of Original String. Lets take S="banana" as an example. Then its Reverse string S'=ananab.

Step 1: Concatenate S + # + S' to get String Str ,where # is an alphabet not present in original String.

    Concatenated String Str=S+#+S'
    Str="banana#ananab"

Step 2: Now construct the Suffix Array of the string Str.

In this example ,the suffix array is:

Suffix Number   Index   Sorted Suffix
0               6       #ananab
1               5       a#ananab
2               11      ab
3               3       ana#ananab
4               9       anab
5               1       anana#ananab
6               7       ananab
7               12      b
8               0       banana#ananab
9               4       na#ananab
10              10      nab
11              2       nana#ananab
12              8       nanab

Please Note that a suffix array is an array of integers giving the starting positions of suffixes of a string in lexicographical order.So the Array that holds Index of starting position is a suffix Array.

That is SuffixArray[]={6,5,11,3,9,1,7,12,0,4,10,2,8};

Step 3: As you had managed to construct the Suffix Array ,Now find the Longest Common Prefixes Between the adjacent suffixes.

LCP between #ananab        a#ananab          is :=0
LCP between a#ananab       ab                is :=1
LCP between ab             ana#ananab        is :=1
LCP between ana#ananab     anab              is :=3
LCP between anab           anana#ananab      is :=3
LCP between anana#ananab   ananab            is :=5
LCP between ananab         b                 is :=0
LCP between b              banana#ananab     is :=1
LCP between banana#ananab  na#ananab         is :=0
LCP between na#ananab      nab               is :=2
LCP between nab            nana#ananab       is :=2
LCP between nana#ananab nanab                is :=4

Thus LCP array LCP={0,0,1,1,3,3,5,0,1,0,2,2,4}.

Where LCP[i]=Length of Longest Common Prefix between Suffix i and suffix (i-1). (for i>0)

Step 4:

Now you have constructed a LCP array ,Use the following Logic.

    Let the length of the Longest Palindrome ,longestlength:=0 (Initially)
    Let Position:=0.
    for(int i=1;i<Len;++i)
    {
        //Note that Len=Length of Original String +"#"+ Reverse String
        if((LCP[i]>longestlength))
        {
            //Note Actual Len=Length of original Input string .
            if((suffixArray[i-1]<actuallen && suffixArray[i]>actuallen)||(suffixArray[i]<actuallen && suffixArray[i-1]>actuallen))
            {
                 //print :Calculating Longest Prefixes b/w suffixArray[i-1] AND  suffixArray[i]


                longestlength=LCP[i];
              //print The Longest Prefix b/w them  is ..
              //print The Length is :longestlength:=LCP[i];
                Position=suffixArray[i];
            }
        }
    }
    So the length of Longest Palindrome :=longestlength;
    and the longest palindrome is:=Str[position,position+longestlength-1];

Execution Example ::

    actuallen=Length of banana:=6
    Len=Length of "banana#ananab" :=13.

Calculating Longest Prefixes b/w a#ananab AND  ab
The Longest Prefix b/w them  is :a 
The Length is :longestlength:= 1 
Position:= 11




Calculating Longest Prefixes b/w ana#ananab AND  anab
The Longest Prefix b/w them  is :ana
The Length is :longestlength:= 3 
Position:=9



Calculating Longest Prefixes b/w anana#ananab AND  ananab
The Longest Prefix b/w them  is :anana
The Length is :longestlength:= 5 
Position:= 7

So Answer =5.
And the Longest Palindrome is :=Str[7,7+5-1]=anana

Just Make a Note ::

The if condition in Step 4 basically refers that ,in each iteration(i) ,if I take the suffixes s1(i) and s2(i-1) then ,"s1 must contains # and s2 must not contain # " OR "s2 must contains # and s1 must not contains # ".

 |(1:BANANA#ANANAB)|leaf
tree:|
     |     |      |      |(7:#ANANAB)|leaf
     |     |      |(5:NA)|
     |     |      |      |(13:B)|leaf
     |     |(3:NA)|
     |     |      |(7:#ANANAB)|leaf
     |     |      |
     |     |      |(13:B)|leaf
     |(2:A)|
     |     |(7:#ANANAB)|leaf
     |     |
     |     |(13:B)|leaf
     |
     |      |      |(7:#ANANAB)|leaf
     |      |(5:NA)|
     |      |      |(13:B)|leaf
     |(3:NA)|
     |      |(7:#ANANAB)|leaf
     |      |
     |      |(13:B)|leaf
     |
     |(7:#ANANAB)|leaf

A few years late...

Assume s is the original string, and r is s reversed. Let's also assume we've completely built a suffix tree ST using s.

Our next step is to check all the suffixes of r against ST. With each new suffix of r, we'll maintain a count of the first k characters we've matched successfully against a preexisting suffix in the tree (ie, one of s's suffixes).

As an example, say we're matching the suffix "RAT" from r, and s contained some suffixes beginning with "RA", but none that matched "RAT". k would equal 2 when we finally had to abandon hope for the final characters "T". We matched the first two characters of r's suffix with the first two characters of s's suffix. We'll call this node that we reached n.

Now, how do we know when we've found a palindrome? By checking all leaf nodes under n.

In a traditional suffix tree, the starting index of each suffix is stored at the leaf node of that suffix branch. In our above example, s may have contained a bunch of suffixes starting with "RA", each of which begins at one of the indexes present in the leaf node descendants of n.

Let's use these indices.

What does it mean if we match k characters of one of R's substrings against k characters in ST? Well, it simply means we found some string reversed. But what does it mean if the location where the substring begins in R is equal to the matched substring in S plus k? Yes, it means s[i] through s[i+k] reads the same as s[i+k] through s[i]! And so, be definition we've located a palindrome of size k.

Now, all you have to do is keep a tab on the longest palindrome found so far, and return it at the end of your function.

Simple and short explanation from Skiena - The Algorithm Design Manual

Find the longest palindrome in S [using suffix tree] – A palindrome is a string that reads the same if the order of characters is reversed, such as madam. To find the longest palindrome in a string S, build a single suffix tree containing all suffixes of S and the reversal of S, with each leaf identified by its starting position. A palindrome is defined by any node in this tree that has forward and reversed children from the same position.

DP solution:

int longestPalin(char *str)
{
    n = strlen(str);
    bool table[n][n]l
    memset(table, 0, sizeof(table));
    int start = 0;

    for(int i=0; i<n; ++i)
        table[i][i] = true;
    int maxlen = 1;

    for(int i=0; i<n-1; ++i)
    {
        if(str[i] == str[i+1])
        {
            table[i][i] = true;
            start = i;
            maxlen = 2;
        }
    }

    for(int k=3; k<=n; ++k)
    {
        for(int i=0; i<n-k+1; ++i)
        {
            int j = n+k-1;
            if(str[i] == str[j] && table[i+1][j-1])
            {
                table[i][j] = true;
                if(k > maxlen)
                {
                    start = i;
                    maxlen = k;
                }
            }
        }
    }
    print(str, start, start+maxlen-1);
    return maxlen;
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!