Short, Java implementation of a suffix tree and usage?

末鹿安然 提交于 2019-12-04 09:30:59

问题


I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines long and spans several classes. Ideally, the implementation would be as short as possible and span no more than a few hundred lines.

Does anyone have such an implementation?


回答1:


I just finished a Java implementation of a suffix tree. In my blog entry you can find out more about suffix trees, see how to use my library, as well as download and build the library using Subversion and Maven. Yes, it's longer than just a few lines in a single class file, but it is highly documented and is created for use in the real world for practical purposes. In addition, it uses the Ukkonen approach for linear time construction. (Most of the implementations noted here have at least O(n^2) running time.)




回答2:


The article "Simple Linear Work Suffix Array Construction", by Karkkainen and Sanders, terminates with 50 lines of C++. You will probably also want something to produce the LCP array. Googling for "Computing the LCP array in linear time, given S and the suffix array POS." should find you that.




回答3:


You can also take mine but this is not Ukkonen's algorithm - as all other simple approaches, it runs in quadratic time. I agree that a naive algorithm (that may work ok for the shorter sequences) is easy to write in half a day at most.



来源:https://stackoverflow.com/questions/2042825/short-java-implementation-of-a-suffix-tree-and-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!