I need to store 4000 string of fixed size (8-char) in C#, but I do not know what is best to use regarding the space and time of adding and retrieving the item: Bloom filter, Has
In this question, you really only have two data structures in C# since Dictionaries in C# are implemented using hash tables. So we'll refer to Dictionary and HashTable as both being hash tables. If you use one of them, then you probably want Dictionary due to type safety and performance as covered here: Why is Dictionary preferred over hashtable? But as a Dictionary is implemented using a hash table, it's not a huge difference either way.
But the real question is hash table (Dictionary) versus Bloom filter. Someone has previously asked the related question, What is the advantage to using bloom filters? They also link to the Wikipedia page on Bloom filters, which is quite informative: https://en.wikipedia.org/wiki/Bloom_filter The short versions of the answer is that Bloom filters are smaller and faster. They do, however, have a cost associated with this: they are not completely accurate. In a hash table, the original string is always stored for exact comparison. First you hash the value and this tells you where in the table to look. Once you've looked in the table, you then check the value located there against the value you're searching for. In a Bloom filter, you use multiple hashes to calculate a set of locations. If there are 1's in all of those locations, then you consider the string to be found. This means that sometimes strings will be "found" which were not originally inserted. If the table is too small, in fact, you could reach a saturation point where it would appear that any string you tried would be in the Bloom filter. Because you know how many strings you are going to be inserting, you can size the table appropriately to avoid this.
Let's look at the sizes involved. To make the numbers come out cleanly, I'm going to pretend that you have exactly 4096 strings. To have a relatively low-collision hash table, you would want your table to be at least as large as the number of strings. So, realistically (assuming 32 bit (4 byte) pointers), in this case, you'd be looking at a size of 4096*4 bytes = 16K for the table, plus 4096*(4+4+8) = 64K for the list nodes (next pointer + string pointer) and strings. So, in total, probably about 80K, which probably isn't very much memory in most situations where you would be using C#.
For Bloom filters, we have to decide the error rate we want to aim for in our size calculations. When we talk about a 1% error rate, it would mean that out of every 100 strings which were not inserted into the Bloom filter, 1 would be falsely indicated as being present. Strings which were inserted will always be correctly indicated as having been inserted. Using the equation m = -n*ln(p)/(ln(2)^2), we can calculate the minimum size to give us a certain error rate. In that equation, m is the number of slots in the table, p is the error rate, and n is the number of strings to be inserted. So, if we set p to be 0.01 (1% error), then we get approximately 9.6*4096 bits = 9.6*512 bytes = 4.8K, which is obviously quite a bit smaller. But, really, 1% is kind of high for an error rate. So more, realistically, we should probably go for something more like 0.0001% which comes out to 28.8*4096b bits = 28.8*512 bytes = 14.4K. Obviously, either of those are substantially smaller than the 80K we calculated for the hash table. However, the hash table has an error rate of 0 which is clearly less than either 1% or 0.0001%.
So, really, it's up to you whether or not, in your situation, the trade-off of losing some accuracy for gaining a little speed and a little time is worthwhile. Realistically, either option is likely to be small enough and fast enough for the vast majority of real world situations.