What would be the best hashing algorithm if we had the following priorities (in that order):
It doe
You can get both using the Knuth hash function described here.
It's extremely fast assuming a power-of-2 hash table size -- just one multiply, one shift, and one bit-and. More importantly (for you) it's great at minimizing collisions (see this analysis).
Some other good algorithms are described here.
Here is a straightforward way of implementing it yourself: http://www.devcodenote.com/2015/04/collision-free-string-hashing.html
Here is a snippet from the post:
if say we have a character set of capital English letters, then the length of the character set is 26 where A could be represented by the number 0, B by the number 1, C by the number 2 and so on till Z by the number 25. Now, whenever we want to map a string of this character set to a unique number , we perform the same conversion as we did in case of the binary format
The simple hashCode used by Java's String class might show a suitable algorithm.
Below is the "GNU Classpath" implementation. (License: GPL)
/**
* Computes the hashcode for this String. This is done with int arithmetic,
* where ** represents exponentiation, by this formula:<br>
* <code>s[0]*31**(n-1) + s[1]*31**(n-2) + ... + s[n-1]</code>.
*
* @return hashcode value of this String
*/
public int hashCode()
{
if (cachedHashCode != 0)
return cachedHashCode;
// Compute the hash code using a local variable to be reentrant.
int hashCode = 0;
int limit = count + offset;
for (int i = offset; i < limit; i++)
hashCode = hashCode * 31 + value[i];
return cachedHashCode = hashCode;
}