Say I have an object that stores a byte array and I want to be able to efficiently generate a hashcode for it. I\'ve used the cryptographic hash functions for this in the pa
Don't use cryptographic hashes for a hashtable, that's ridiculous/overkill.
Here ya go... Modified FNV Hash in C#
http://bretm.home.comcast.net/hash/6.html
public static int ComputeHash(params byte[] data)
{
unchecked
{
const int p = 16777619;
int hash = (int)2166136261;
for (int i = 0; i < data.Length; i++)
hash = (hash ^ data[i]) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return hash;
}
}
Is using the existing hashcode from the byte array field not good enough? Also note that in the Equals method you should check that the arrays are the same size before doing the compare.
private int? hashCode;
public override int GetHashCode()
{
if (!hashCode.HasValue)
{
var hash = 0;
for (var i = 0; i < bytes.Length; i++)
{
hash = (hash << 4) + bytes[i];
}
hashCode = hash;
}
return hashCode.Value;
}
If you are looking for performance, I tested a few hash keys, and I recommend Bob Jenkin's hash function. It is both crazy fast to compute and will give as few collisions as the cryptographic hash you used until now.
I don't know C# at all, and I don't know if it can link with C, but here is its implementation in C.
Generating a good hash is easier said than done. Remember, you're basically representing n bytes of data with m bits of information. The larger your data set and the smaller m is, the more likely you'll get a collision ... two pieces of data resolving to the same hash.
The simplest hash I ever learned was simply XORing all the bytes together. It's easy, faster than most complicated hash algorithms and a halfway decent general-purpose hash algorithm for small data sets. It's the Bubble Sort of hash algorithms really. Since the simple implementation would leave you with 8 bits, that's only 256 hashes ... not so hot. You could XOR chunks instead of individal bytes, but then the algorithm gets much more complicated.
So certainly, the cryptographic algorithms are maybe doing some stuff you don't need ... but they're also a huge step up in general-purpose hash quality. The MD5 hash you're using has 128 bits, with billions and billions of possible hashes. The only way you're likely to get something better is to take some representative samples of the data you expect to be going through your application and try various algorithms on it to see how many collisions you get.
So until I see some reason to not use a canned hash algorithm (performance, perhaps?), I'm going to have to recommend you stick with what you've got.
Borrowing from the code generated by JetBrains software, I have settled on this function:
public override int GetHashCode()
{
unchecked
{
var result = 0;
foreach (byte b in _key)
result = (result*31) ^ b;
return result;
}
}
The problem with just XOring the bytes is that 3/4 (3 bytes) of the returned value has only 2 possible values (all on or all off). This spreads the bits around a little more.
Setting a breakpoint in Equals was a good suggestion. Adding about 200,000 entries of my data to a Dictionary, sees about 10 Equals calls (or 1/20,000).