Which is best regarding the time and space: Bloom filter, Hash table or Dictionary?

前端 未结 3 1541
余生分开走
余生分开走 2021-02-07 20:40

I need to store 4000 string of fixed size (8-char) in C#, but I do not know what is best to use regarding the space and time of adding and retrieving the item: Bloom filter, Has

3条回答
  •  温柔的废话
    2021-02-07 21:39

    A dictionary is an abstract data type that represents a mapping from one type to another. It doesn't specify what the implementation of the dictionary is - it could be backed by a hash table, a balanced binary search tree, a skip list, or one of many other structures. It's probably not appropriate here, because a dictionary associates one type of elements with some other type. You're not doing this - you're just concerned with storing elements - so this is probably inappropriate.

    A Bloom filter is a probabilistic data structure that is good for checking whether or not an element is definitely not in a set, but cannot tell you for sure whether something is in the set. It's commonly used in distributed systems to avoid unnecessary network reads. Each computer can store a Bloom filter of what entries might be in a database, and can filter out obviously unnecessary network calls by not querying a remote system if an entry is ruled out by the filter. It's not very good for what you're trying to do, since the false positives are probably a deal-breaker.

    The hash table, though, is a great data structure for what you want. It supports fast lookups and insertions of elements and, with a good implementation, can be extremely memory efficient. However, it doesn't store the elements in sorted order, which may be a problem depending on your application.

    If you do want sorted order, there are two other structures you might want to consider. The first would be a balanced binary search tree, which supports fast lookup and deletion and stores elements in sorted order. There are many good implementations out there; virtually all good programming languages ship with an implementation. The other is the trie, which supports very fast lookup and access and maintains sorted ordering. It can be a bit space-inefficient depending on the distribution of your strings, but might be exactly what you're looking for.

    Hope this helps!

提交回复
热议问题