prefix-tree | 易学教程

Choosing an appropriate data structure (hash table vs. suffix tree) for indexing a very large set of similar strings

阅读更多关于 Choosing an appropriate data structure (hash table vs. suffix tree) for indexing a very large set of similar strings

问题 I have a large set of strings, on order ~10^12 or so, and I need to choose an appropriate data structure so that, provided a string, I can retrieve and associated integer value in something like O(log(n)) or O(m) time where 'n' is the length of the list of strings and 'm' is the length of each string. We can expect that our set of strings, each of length 'm' and encoded over some alphabet of size 'q', covers nearly all possible strings of this length. For example, imagine we have 10^12 all

Finding the single nearest neighbor using a Prefix tree in O(1)?

阅读更多关于 Finding the single nearest neighbor using a Prefix tree in O(1)?

I'm reading a paper where they mention that they were able to find the single nearest neighbor in O(1) using a prefix tree. I will describe the general problem and then the classical solution and finally the proposed solution in the paper: Problem : given a list of bit vectors L (all vectors have the same length) and query bit vector q, we would like to find the nearest neighbor of q. The distance metric is the hamming distance (how many bits are different). The naive approach would be to go through the list and calculate the hamming distance between each vector in the list and q, which will

Choosing an appropriate data structure (hash table vs. suffix tree) for indexing a very large set of similar strings

阅读更多关于 Choosing an appropriate data structure (hash table vs. suffix tree) for indexing a very large set of similar strings

I have a large set of strings, on order ~10^12 or so, and I need to choose an appropriate data structure so that, provided a string, I can retrieve and associated integer value in something like O(log(n)) or O(m) time where 'n' is the length of the list of strings and 'm' is the length of each string. We can expect that our set of strings, each of length 'm' and encoded over some alphabet of size 'q', covers nearly all possible strings of this length. For example, imagine we have 10^12 all-unique binary strings of length m = 39. This implies that we've covered ~54% of the set of all possible

Which search is faster, binary search or using prefix tree?

阅读更多关于 Which search is faster, binary search or using prefix tree?

Suppose I have a list of strings and a prefix tree of those strings, and I would like to locate a string given a key, which one is more faster? binary search or prefix tree search? Why and what's the time complexity? Thanks! Both techniques have their advantages, and their drawbacks: Suffix tree Advantages: O(N) building complexity O(M) search of a pattern of length M They allow online construction Drawbacks: Space inefficient Really complex construction algorithms Binary search (with suffix array) Advantages: You can sort the string array in O(N) time Space efficient (five times less memory

Javascript: Find exactly 10 words in a prefix tree that start with a given prefix

阅读更多关于 Javascript: Find exactly 10 words in a prefix tree that start with a given prefix

问题 I have a trie (also called a prefix tree). Given a prefix, I want to get a list of ten words that start with the prefix. The thing that's unique about this problem is that I only want 10 of the words that start with the given prefix-- not all of them. There are optimizations that can be made, given this. My code below I know works fine. Each node in the trie has a children property and a this_is_the_end_of_a_word property. For instance, when you insert "hi", this is what the trie looks like:

Javascript: Find exactly 10 words in a prefix tree that start with a given prefix

阅读更多关于 Javascript: Find exactly 10 words in a prefix tree that start with a given prefix

I have a trie (also called a prefix tree). Given a prefix, I want to get a list of ten words that start with the prefix. The thing that's unique about this problem is that I only want 10 of the words that start with the given prefix-- not all of them. There are optimizations that can be made, given this. My code below I know works fine. Each node in the trie has a children property and a this_is_the_end_of_a_word property. For instance, when you insert "hi", this is what the trie looks like: . The problem: Given a prefix, I want to get a list of ten words that start with the prefix. My