Here\'s a very simple way to build an suffix array from a string in python:
def sort_offsets(a, b):
return cmp(content[a:], content[b:])
content = \"foobar
You could use the blist extension type that I wrote. A blist
works like the built-in list
, but (among other things) uses copy-on-write so that taking a slice takes O(log n) time and memory.
from blist import blist
content = "foobar baz foo"
content = blist(content)
suffix_array = range(len(content))
suffix_array.sort(key = lambda a: content[a:])
print suffix_array
[6, 10, 4, 8, 3, 7, 11, 0, 13, 2, 12, 1, 5, 9]
I was able to create a suffix_array from a randomly generated 100,000-character string in under 5 seconds, and that includes generating the string.