I know this is a somewhat beaten topic, but I have reached the limit of help I can get from what\'s already been answered.
This is for the Rosalind project problem L
Nice question for an excercise in basic string-operations. I didnt remember the suffix-tree anymore ;) But as you have stated: theory-wise, you are set.
The wikipedia-stub onto this topic is a bit confusing. You only need to know, if you are the outermost non-leaf-node with n >= k
childs. If you found the substring from the root-node to this one in the whole string, the suffix-tree tells you, that there are n
possible continuitations. So there must be n
places, where this string occurs.
A simple key-concept of this and many similar problems is to do a depth-first-search: In every Node, ask the child-elements for their value and return the maximum of it to the parent. The root-node will get the final result.
How the values are calculated differs between the problems. Here you have three possiblilitys for every node:
n
childs, the concated string of every edge from the root to this node appears n
times in the whole string. If we need at least k
nodes and k > n
, the result is also invalid.Of course, you also have to return the correspondending node. Otherwise you will know, how long the longest repeated substring is but not where it is.
You should try to code this by yourself first. Constructing the tree is simple but not trivial if you want to gather all necessary informations. Nevertheless here is a simple example. Please note: every sanity-checking is dropped out and everything will fail horribly, if the input is somehow invalid. E.g. do not try to use any other root-index than one, do not refere to nodes as a parent, which weren't referenced as a childs before, etc. Much room for improvement *hint;)*.
class Node(object):
def __init__(self, idx):
self.idx = idx # not needed but nice for prints
self.parent = None # edge to parent or None
self.childs = [] # list of edges
def get_deepest(self, k = 2):
max_value = -1
max_node = None
for edge in self.childs:
r = edge.n2.get_deepest()
if r is None: continue # leaf
value, node = r
value += len(edge.s)
if value > max_value: # new best result
max_value = value
max_node = node
if max_node is None:
# we are either a leaf (no edge connected) or
# the last non-leaf.
# The number of childs have to be k to be valid.
return (0, self) if len(self.childs) == k else None
else:
return (max_value, max_node)
def get_string_to_root(self):
if self.parent is None: return ""
return self.parent.n1.get_string_to_root() + self.parent.s
class Edge(object):
# creating the edge also sets the correspondending
# values in the nodes
def __init__(self, n1, n2, s):
#print "Edge %d -> %d [ %s]" % (n1.idx, n2.idx, s)
self.n1, self.n2, self.s = n1, n2, s
n1.childs.append(self)
n2.parent = self
nodes = {1 : Node(1)} # root-node
string = sys.stdin.readline()
k = int(sys.stdin.readline())
for line in sys.stdin:
parent_idx, child_idx, start, length = [int(x) for x in line.split()]
s = string[start-1:start-1+length]
# every edge constructs a Node
nodes[child_idx] = Node(child_idx)
Edge(nodes[parent_idx], nodes[child_idx], s)
(depth, node) = nodes[1].get_deepest(k)
print node.get_string_to_root()