strcmp for python or how to sort substrings efficiently (without copy) when building a suffix array

后端 未结 4 1750
孤独总比滥情好
孤独总比滥情好 2021-02-06 00:24

Here\'s a very simple way to build an suffix array from a string in python:

def sort_offsets(a, b):
    return cmp(content[a:], content[b:])

content = \"foobar          


        
4条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-06 01:09

    You could use the blist extension type that I wrote. A blist works like the built-in list, but (among other things) uses copy-on-write so that taking a slice takes O(log n) time and memory.

    from blist import blist
    
    content = "foobar baz foo"
    content = blist(content)
    suffix_array = range(len(content))
    suffix_array.sort(key = lambda a: content[a:])
    print suffix_array
    [6, 10, 4, 8, 3, 7, 11, 0, 13, 2, 12, 1, 5, 9]
    

    I was able to create a suffix_array from a randomly generated 100,000-character string in under 5 seconds, and that includes generating the string.

提交回复
热议问题