I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking wh
Now assumming you want a hash, and want something blazing fast that would work in your case, because your strings are just 6 chars long you could use this magic:
size_t precision = 2; //change the precision with this
size_t hash(const char* str)
{
return (*(size_t*)str)>> precision;
}
CRC is for slowpokes ;)
Explanation: This works by casting the contents of the string pointer to "look like" a size_t (int32 or int64 based on the optimal match for your hardware). So the contents of the string are interpreted as a raw number, no worries about characters anymore, and you then bit-shift this the precision needed (you tweak this number to the best performance, I've found 2 works well for hashing strings in set of a few thousands).
Also the really neat part is any decent compiler on modern hardware will hash a string like this in 1 assembly instruction, hard to beat that ;)
If you need to search short strings and insertion is not an issue, maybe you could use a B-tree, or a 2-3 tree, you don't gain much by hashing in your case.
The way you would do this is by placing a letter in each node so you first check for the node "a", then you check "a"'s children for "p", and it's children for "p", and then "l" and then "e". In situations where you have "apple" and "apply" you need to seek to the last node, (since the only difference is in the last "e" and "y")
But but in most cases you'll be able to get the word after a just a few steps ("xylophone" => "x"->"ylophone"), so you can optimize like this. This can be faster than hashing
The size of your table will dictate what size hash you should use. You would like to minimize collisions of course. I'm not sure what you are specifying by max items and capacity (they seem like the same thing to me) In any case either of those numbers suggest that a 32 bit hash would be sufficient. You might get away with CRC16 (~65,000 possibilities) but you would probably have a lot of collisions to deal with. On the other hand, a collision may be quicker to deal with than than a CRC32 hash.
I would say, go with CRC32. You'll find no shortage of documentation and sample code. Since you have your maximums figured out and speed is a priority, go with an array of pointers. Use the hash to generate an index. On collision, increment index until you hit an empty bucket.. quick and simple.