Creating a hashcode for use in a database (ie not using GetHashCode)

前端 未结 3 1165
迷失自我
迷失自我 2021-01-22 01:28

I have recently been instructed in the ways of GetHashCode() and in particular \"Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains\" (From

相关标签:
3条回答
  • 2021-01-22 02:02

    Are you using a MSSQL Database? The T-SQL Checksum function does exactly that.

    0 讨论(0)
  • 2021-01-22 02:14

    I would encourage you to consider what the others have said: let the database do what it's good at. Creating a hash code in order to optimize lookups is an indication that the indexes on your table aren't what they should be.

    That said, if you really need a hash code:

    You don't say if you want a 32-bit or 64-bit hash code. This one will create a 64-bit hash code for a string. It's reasonably collision-resistant.

    public static long ComputeHashCode(string url)
    {
        const ulong p = 1099511628211;
    
        ulong hash = 14695981039346656037;
    
        for (int i = 0; i < url.Length; ++i)
        {
            hash = (hash ^ url[i]) * p;
        }
    
        // Wang64 bit mixer
        hash = (~hash) + (hash << 21);
        hash = hash ^ (hash >> 24);
        hash = (hash + (hash << 3)) + (hash << 8);
        hash = hash ^ (hash >> 14);
        hash = (hash + (hash << 2)) + (hash << 4);
        hash = hash ^ (hash >> 28);
        hash = hash + (hash << 31);
    
        if (hash == (ulong)UNKNOWN_RECORD_HASH)
        {
            ++hash;
        }
        return (long)hash;
    }
    

    Note that this is a hash code and the likelihood of a collision is pretty small if you have up to a few billion records. Rule of thumb: you have a 50% chance of collision when the number of items exceeds the square root of your hash code's range. This hash code has a range of 2^64, so if you have 2^32 items, your chance of a collision is about 50%.

    See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=792 and http://en.wikipedia.org/wiki/Birthday_paradox#Probability_table for more information.

    0 讨论(0)
  • 2021-01-22 02:17

    As SLaks pointed out in a comment, looking up data is what databases are good at.

    If you need fast lookups, create an index on the column. At the very least, you won't have to deal with collisions anymore.

    0 讨论(0)
提交回复
热议问题