How Do I Choose Between a Hash Table and a Trie (Prefix Tree)?

后端 未结 8 1786
终归单人心
终归单人心 2020-11-27 09:13

So if I have to choose between a hash table or a prefix tree what are the discriminating factors that would lead me to choose one over the other. From my own naive point of

相关标签:
8条回答
  • 2020-11-27 09:29

    Insertion and lookup on a trie is linear with the lengh of the input string O(s).

    A hash will give you a O(1) for lookup ans insertion, but first you have to calculate the hash based on the input string which again is O(s).

    Conclussion, the asymptotic time complexity is linear in both cases.

    The trie has some more overhead from data perspective, but you can choose a compressed trie which will put you again, more or less on a tie with the hash table.

    To break the tie ask yourself this question: Do i need to lookup for full words only? Or do I need to return all words matching a prefix? (As in a predictive text input system ). For the first case, go for a hash. It is simpler and cleaner code. Easier to test and maintain. For a more ellaborated use case where prefixes or sufixes matter, go for a trie.

    And if you do it just for fun, implementing a trie would put a Sunday afternoon to a good use.

    0 讨论(0)
  • 2020-11-27 09:31

    Some (usually embedded, real-time) applications require that the processing time be independent of the data. In that case, a hash table can guarantee a known execution time, while a trie varies based on the data.

    0 讨论(0)
  • 2020-11-27 09:33

    Advantages of tries:

    The basics:

    • Predictable O(k) lookup time where k is the size of the key
    • Lookup can take less than k time if it's not there
    • Supports ordered traversal
    • No need for a hash function
    • Deletion is straightforward

    New operations:

    • You can quickly look up prefixes of keys, enumerate all entries with a given prefix, etc.

    Advantages of linked structure:

    • If there are many common prefixes, the space they require is shared.
    • Immutable tries can share structure. Instead of updating a trie in place, you can build a new one that's different only along one branch, elsewhere pointing into the old trie. This can be useful for concurrency, multiple simultaneous versions of a table, etc.
    • An immutable trie is compressible. That is, it can share structure on the suffixes as well, by hash-consing.

    Advantages of hashtables:

    • Everyone knows hashtables, right? Your system will already have a nice well-optimized implementation, faster than tries for most purposes.
    • Your keys need not have any special structure.
    • More space-efficient than the obvious linked trie structure (see comments below)
    0 讨论(0)
  • 2020-11-27 09:34

    Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function.

    Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss.

    A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms - Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.

    0 讨论(0)
  • 2020-11-27 09:38

    It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.

    0 讨论(0)
  • 2020-11-27 09:44

    HashTable implementation is space efficient as compared to basic Trie implementation. But with strings, ordering is necessary in most of the practical applications. But HashTable totally disturbs the lexographical order. Now, if your application is doing operations based on lexographical order (like partial search, all strings with given prefix, all words in sorted order), you should use Tries. For only lookup, HashTable should be used (as arguably, it gives minimum lookup time).

    P.S.: Other than these, Ternary Search Trees (TSTs) would be an excellent choice. Its lookup time is more than HashTable, but is time-efficient in all other operations. Also, its more space efficient than tries.

    0 讨论(0)
提交回复
热议问题