I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking wh
Since C++11, C++ has provided a std::hash< string >( string ). That is likely to be an efficient hashing function that provides a good distribution of hash-codes for most strings.
Furthermore, if you are thinking of implementing a hash-table, you should now be considering using a C++ std::unordered_map instead.
This simple polynomial works surprisingly well. I got it from Paul Larson of Microsoft Research who studied a wide variety of hash functions and hash multipliers.
unsigned hash(const char* s, unsigned salt)
{
unsigned h = salt;
while (*s)
h = h * 101 + (unsigned) *s++;
return h;
}
salt
should be initialized to some randomly chosen value before the hashtable is created to defend against hash table attacks. If this isn't an issue for you, just use 0.
The size of the table is important too, to minimize collisions. Sounds like yours is fine.
Boost.Functional/Hash might be of use to you. I've not tried it, so I can't vouch for its performance.
Boost also has a CRC library.
I would look a Boost.Unordered first (i.e. boost::unordered_map<>). It uses hash maps instead of binary trees for containers.
I believe some STL implementations have a hash_map<> container in the stdext namespace.
How about something simple:
// Initialize hash lookup so that it maps the characters
// in your string to integers between 0 and 31
int hashLookup[256];
// Hash function for six character strings.
int hash(const char *str)
{
int ret = 0, mult = 1;
for (const char *p = str; *p; *p++, mult *= 32) {
assert(*p >= 0 && *p < 256);
ret += mult * hashLookup[*p];
}
return ret;
}
This assumes 32 bit ints. It uses 5 bits per character, so the hash value only has 30 bits in it. You could fix this, perhaps, by generating six bits for the first one or two characters. If you character set is small enough, you might not need more than 30 bits.
Since you store english words, most of your characters will be letters and there won't be much variation in the most significant two bits of your data. Besides of that I would keep it very simple, just using XOR. After all you're not looking for cryptographic strength but just for a reasonably even distribution. Something along these lines:
size_t hash(const std::string &data) {
size_t h(0);
for (int i=0; i<data.length(); i++)
h = (h << 6) ^ (h >> 26) ^ data[i];
}
return h;
}
Besides of that, have you looked at std::tr1::hash as a hashing function and/or std::tr1::unordered_map as an implementation of a hash table? Using these would probably be save much work opposed to implementing your own classes.
The number one priority of my hash table is quick search (retrieval).
Well then you are using the right data structure, as searching in a hash table is O(1)! :)
The CRC32 should do fine. The implementation isn't that complex, it's mainly based on XORs. Just make sure it uses a good polynomial.