Choosing an appropriate data structure (hash table vs. suffix tree) for indexing a very large set of similar strings
问题 I have a large set of strings, on order ~10^12 or so, and I need to choose an appropriate data structure so that, provided a string, I can retrieve and associated integer value in something like O(log(n)) or O(m) time where 'n' is the length of the list of strings and 'm' is the length of each string. We can expect that our set of strings, each of length 'm' and encoded over some alphabet of size 'q', covers nearly all possible strings of this length. For example, imagine we have 10^12 all