I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available.
When overlapping is allowed,
Since I had a hard time finding a clear description of a working algorithm to obtain the longest non-overlapping repeated substrings using a suffix tree, I'd like to share the version I gathered from various sources.
Algorithm
Explanation
If a substring of S occurs at least twice in S, it is the common prefix P of two suffixes Si and Sj, where i and j denote their respective start position in S. Hence, there exists an inner node v in the suffix tree for S that has two descendant leaves that correspond to i and j such that the concatenation of all edge labels of the path from the root to v is equal to P.
The deepest such node v (in terms of the length of its corresponding prefix) marks the longest, possibly overlapping repeated substring in S. To make sure no overlapping substrings are considered, we have to make sure that P is no longer than the distance between i and j.
We therefore calculate the minimum and the maximum indices imin and imax for each node, which correspond to the positions of the leftmost and the rightmost suffixes of S that share a common prefix. The minimum and maximum indices at a node can be easily obtained from the values of their descendants. (The indices calculation would be more complicated if we were looking for the longest substrings that occur at least k times, because then the distances of all descendants' indices had to be considered, not just two that are the farthest apart.) By considering only prefixes P that satisfy imin + length(P) ≤ imax we make sure the P starting at Si is short enough to not overlap with the suffix Sj.
Additional notes