问题
I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonen algorith. I don't want to waste time rolling my own if such implementation exists.
回答1:
Hei, just finished implementing .NET (c#) library containing different trie implementations. Among them:
- Classical trie
- Patricia trie
- Suffix trie
- A trie using Ukkonen's algorithm
I tried to make source code easy readable. Usage is also very straight forward:
using Gma.DataStructures.StringSearch;
...
var trie = new UkkonenTrie<int>(3);
//var trie = new SuffixTrie<int>(3);
trie.Add("hello", 1);
trie.Add("world", 2);
trie.Add("hell", 3);
var result = trie.Retrieve("hel");
The library is well tested and also published as TrieNet NuGet package.
See github.com/gmamaladze/trienet
回答2:
Hard question. Here's the closest to match I could find: http://www.codeproject.com/KB/recipes/ahocorasick.aspx, which is an implementation of the Aho-Corasick string matching algorithm. Now, the algorithm uses a suffix-tree-like structure per: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm
Now, if you want a prefix tree, this article claims to have an implementation for you: http://www.codeproject.com/KB/recipes/prefixtree.aspx
<HUMOR> Now that I did your homework, how about you mow my lawn. (Reference: http://flyingmoose.org/tolksarc/homework.htm) </HUMOR>
Edit: I found a C# suffix tree implementation that was a port of a C++ one posted on a blog: http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree
Edit: There is a new project at Codeplex that is focused on suffix trees: http://suffixtree.codeplex.com/
回答3:
Here is an implementation of a suffix tree that is reasonably efficient. I haven't studied Ukkonen's implementation, but the running time of this algorithm I believe is quite reasonable, approximately O(N Log N)
. Note the number of internal nodes in the tree created is equal to the number of letters in the parent string.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using NUnit.Framework;
namespace FunStuff
{
public class SuffixTree
{
public class Node
{
public int Index = -1;
public Dictionary<char, Node> Children = new Dictionary<char, Node>();
}
public Node Root = new Node();
public String Text;
public void InsertSuffix(string s, int from)
{
var cur = Root;
for (int i = from; i < s.Length; ++i)
{
var c = s[i];
if (!cur.Children.ContainsKey(c))
{
var n = new Node() {Index = from};
cur.Children.Add(c, n);
// Very slow assertion.
Debug.Assert(Find(s.Substring(from)).Any());
return;
}
cur = cur.Children[c];
}
Debug.Assert(false, "It should never be possible to arrive at this case");
throw new Exception("Suffix tree corruption");
}
private static IEnumerable<Node> VisitTree(Node n)
{
foreach (var n1 in n.Children.Values)
foreach (var n2 in VisitTree(n1))
yield return n2;
yield return n;
}
public IEnumerable<int> Find(string s)
{
var n = FindNode(s);
if (n == null) yield break;
foreach (var n2 in VisitTree(n))
yield return n2.Index;
}
private Node FindNode(string s)
{
var cur = Root;
for (int i = 0; i < s.Length; ++i)
{
var c = s[i];
if (!cur.Children.ContainsKey(c))
{
// We are at a leaf-node.
// What we do here is check to see if the rest of the string is at this location.
for (var j=i; j < s.Length; ++j)
if (cur.Index + j >= Text.Length || Text[cur.Index + j] != s[j])
return null;
return cur;
}
cur = cur.Children[c];
}
return cur;
}
public SuffixTree(string s)
{
Text = s;
for (var i = s.Length - 1; i >= 0; --i)
InsertSuffix(s, i);
Debug.Assert(VisitTree(Root).Count() - 1 == s.Length);
}
}
[TestFixture]
public class TestSuffixTree
{
[Test]
public void TestBasics()
{
var s = "banana";
var t = new SuffixTree(s);
var results = t.Find("an").ToArray();
Assert.AreEqual(2, results.Length);
Assert.AreEqual(1, results[0]);
Assert.AreEqual(3, results[1]);
}
}
}
来源:https://stackoverflow.com/questions/171196/looking-for-the-suffix-tree-implementation-in-c