Reverse Engineering String.GetHashCode

≡放荡痞女 提交于 2019-11-30 23:11:43

Hash codes are not intended to be repeatable across platforms, or even multiple runs of the same program on the same system. You are going the wrong way. If you don't change course, your path will be difficult and one day it may end in tears.

What is the real problem you want to solve? Would it be possible to write your own hash function, either as an extension method or as the GetHashCode implementation of a wrapper class and use that one instead?

First off, Jon is correct; this is a fool's errand. The internal debug builds of the framework that we use to "eat our own dogfood" change the hash algorithm every day precisely to prevent people from building systems -- even test systems -- that rely on unreliable implementation details that are documented as subject to change at any time.

Rather than enshrining an emulation of a system that is documented as being not suitable for emulation, my recommendation would be to take a step back and ask yourself why you're trying to do something this dangerous. Is it really a requirement?

Second, StackOverflow is a technical question and answer site, not a "do my job for me for free" site. If you are hell bent on doing this dangerous thing and you need someone who can rewrite unsafe code into equivalent safe code then I recommend that you hire someone who can do that for you.

While all of the warnings given here are valid, they don't answer the question. I had a situation in which GetHashCode() was unfortunately already being used for a persisted value in production, and I had no choice but to re-implement using the default .NET 2.0 32-bit x86 (little-endian) algorithm. I re-coded without unsafe as shown below, and this appears to be working. Hope this helps someone.

// The GetStringHashCode() extension method is equivalent to the Microsoft .NET Framework 2.0
// String.GetHashCode() method executed on 32 bit systems.
public static int GetStringHashCode(this string value)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int len = value.Length;
    int intval;
    int c0, c1;
    int i = 0;
    while (len > 0)
    {
        c0 = (int)value[i];
        c1 = (int)value[i + 1];
        intval = c0 | (c1 << 16);
        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ intval;
        if (len <= 2)
        {
            break;
        }
        i += 2;
        c0 = (int)value[i];
        c1 = len > 3 ? (int)value[i + 1] : 0;
        intval = c0 | (c1 << 16);
        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ intval;
        len -= 4;
        i += 2;
    }

    return hash1 + (hash2 * 1566083941);
}

The following exactly reproduces the default String hash codes on .NET 4.7 (and probably earlier). This is the hash code given by:

  • Default on a String instance: "abc".GetHashCode()
  • StringComparer.Ordinal.GetHashCode("abc")
  • Various String methods that take StringComparison.Ordinal enumeration.
  • System.Globalization.CompareInfo.GetStringComparer(CompareOptions.Ordinal)

Testing on release builds with full JIT optimization, these versions modestly outperform the built-in .NET code, and have also been heavily unit-tested for exact equivalence with .NET behavior. Notice there are separate versions for x86 versus x64. Your program should generally include both; below the respective code listings is a calling harness which selects the appropriate version at runtime.

x86   -   (.NET running in 32-bit mode)

static unsafe int GetHashCode_x86_NET(int* p, int c)
{
    int h1, h2 = h1 = 0x15051505;

    while (c > 2)
    {
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;
        h2 = ((h2 << 5) + h2 + (h2 >> 27)) ^ *p++;
        c -= 4;
    }

    if (c > 0)
        h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++;

    return h1 + (h2 * 0x5d588b65);
}

x64   -   (.NET running in 64-bit mode)

static unsafe int GetHashCode_x64_NET(Char* p)
{
    int h1, h2 = h1 = 5381;

    while (*p != 0)
    {
        h1 = ((h1 << 5) + h1) ^ *p++;

        if (*p == 0)
            break;

        h2 = ((h2 << 5) + h2) ^ *p++;
    }
    return h1 + (h2 * 0x5d588b65);
}

Calling harness / extension method for either platform (x86/x64):

readonly static int _hash_sz = IntPtr.Size == 4 ? 0x2d2816fe : 0x162a16fe;

public static unsafe int GetStringHashCode(this String s)
{
    /// Note: x64 string hash ignores remainder after embedded '\0'char (unlike x86)
    if (s.Length == 0 || (IntPtr.Size == 8 && s[0] == '\0'))
        return _hash_sz;

    fixed (char* p = s)
        return IntPtr.Size == 4 ?
            GetHashCode_x86_NET((int*)p, s.Length) :
            GetHashCode_x64_NET(p);
}
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!