Background:
Another way that pops in my head, chain xors with rotated hashes based on index:
int shift = 0;
int result = 1;
for(String s : strings)
{
result ^= (s.hashCode() << shift) | (s.hashCode() >> (32-shift)) & (1 << shift - 1);
shift = (shift+1)%32;
}
edit: reading the explanation given in effective java, I think geoff's code would be much more efficient.
A SQL-based solution could be based on the checksum and checksum_agg functions. If I'm following it right, you have something like:
MyTable
MyTableId
HashCode
MyChildTable
MyTableId (foreign key into MyTable)
String
with the various strings for a given item (MyTableId) stored in MyChildTable. To calculate and store a checksum reflecting these (never-to-be-changed) strings, something like this should work:
UPDATE MyTable
set HashCode = checksum_agg(checksum(string))
from MyTable mt
inner join MyChildTable ct
on ct.MyTableId = mt.MyTableId
where mt.MyTableId = @OnlyForThisOne
I believe this is order-independant, so strings "The quick brown" would produce the same checksum as "brown The quick".
Using the GetHashCode()
is not ideal for combining multiple values. The problem is that for strings, the hashcode is just a checksum. This leaves little entropy for similar values. e.g. adding hashcodes for ("abc", "bbc") will be the same as ("abd", "abc"), causing a collision.
In cases where you need to be absolutely sure, you'd use a real hash algorithm, like SHA1, MD5, etc. The only problem is that they are block functions, which is difficult to quickly compare hashes for equality. Instead, try a CRC or FNV1 hash. FNV1 32-bit is super simple:
public static class Fnv1 {
public const uint OffsetBasis32 = 2166136261;
public const uint FnvPrime32 = 16777619;
public static int ComputeHash32(byte[] buffer) {
uint hash = OffsetBasis32;
foreach (byte b in buffer) {
hash *= FnvPrime32;
hash ^= b;
}
return (int)hash;
}
}
I hope this is unnecessary, but since you don't mention anything which sounds like you're only using the hashcodes for a first check and then later verifying that the strings are actually equal, I feel the need to warn you:
Hashcode equality != value equality
There will be lots of sets of strings which yield the identical hashcode, but won't always be equal.
Your first option has the only inconvenience of (String1, String2)
producing the same hashcode of (String2, String1)
. If that's not a problem (eg. because you have a fix order) it's fine.
"Cat all the string together then get the hashcode" seems the more natural and secure to me.
Update: As a comment points out, this has the drawback that the list ("x", "yz") and ("xy","z") would give the same hash. To avoid this, you could join the strings with a string delimiter that cannot appear inside the strings.
If the strings are big, you might prefer to hash each one, cat the hashcodes and rehash the result. More CPU, less memory.