Suffix tree and Tries. What is the difference?

人走茶凉 提交于 2019-11-28 15:25:08
Ze Blob

A suffix tree can be viewed as a data structure built on top of a trie where, instead of just adding the string itself into the trie, you would also add every possible suffix of that string. As an example, if you wanted to index the string banana in a suffix tree, you would build a trie with the following strings:

banana
anana
nana
ana
na
a

Once that's done you can search for any n-gram and see if it is present in your indexed string. In other words, the n-gram search is a prefix search of all possible suffixes of your string.

This is the simplest and slowest way to build a suffix tree. It turns out that there are many fancier variants on this data structure that improve on either or both space and build time. I'm not well versed enough in this domain to give an overview but you can start by looking into suffix arrays or this class advanced data structures (lecture 16 and 18).

This answer also does a wonderfull job explaining a variant of this data-structure.

If you imagine a Trie in which you put some word's suffixes, you would be able to query it for the string's substrings very easily. This is the main idea behind suffix tree, it's basically a "suffix trie".

But using this naive approach, constructing this tree for a string of size n would be O(n^2) and take a lot of memory.

Since all the entries of this tree are suffixes of the same string, they share a lot of information, so there are optimized algorithms that allows you to create them more efficiently. Ukkonen's algorithm, for example, allows you to create a suffix tree online in O(n) time complexity.

The difference is very simple. A suffix tree has less "dummy" nodes than the suffix trie. These dummy nodes are single characters that increase the lookup operation at the tree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!