What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?
Assuming an input text of N
characters in length, the minimum number of nodes, including the root node and all leaf nodes, is N+1
, the maximum number of nodes, including the root and leaves, is 2N-1
.
Proof of minimum: There must be at least one leaf node for every suffix, and there are N
suffixes. There need not be any inner nodes, example: if the text is a sequence of unique symbols, abc$
, there are no branches, hence no inner nodes in the resulting suffix tree:
Hence the minimum is N
leaves, 0
inner nodes, and 1
root node, in sum N+1
nodes.
Proof of maximum: The number of leaf nodes can never be larger than N
, because a leaf node is where a suffix ends, and you can't have more than N
distinct suffixes in a string of length N
. (In fact, you always have exactly N
distinct suffixes, hence N
leaf nodes exactly.) The root node is always exactly 1
, so the question is what is the maximum number of inner nodes. Every inner node introduces a branch in the tree (because inner nodes of a suffix tree have at least 2 children). Each new branch must eventually lead to at least one extra leaf node, so if you have K
inner nodes, there must be at least K+1
leaf nodes, and the presence of the root node requires at least one additional leaf (unless the tree is empty). But the number of leaf nodes is bounded by N
, so the maximum number of inner nodes is bounded by N-2
. This yields exactly N
leaves, 1
root, and a maximum of N-2
inner nodes, 2N-1
in total.
To see that this is not only a theoretical upper bound, but some suffix trees actually reach this maximum, consider as an example a string with just one repeated character: 'aaa$'. Confirm that the suffix tree for this has 7 nodes (including root and leaves):
Summary: As evident, the only real variable is the number of inner nodes; the number of roots and leaves is constant at 1
and N
for all suffix trees, while the number of inner nodes varies between 0
and N-2
.
来源:https://stackoverflow.com/questions/13389858/maximum-and-minimum-number-of-nodes-in-a-suffix-tree