Use byte[] as key in dictionary

前端 未结 7 1563
南旧
南旧 2020-12-28 12:11

I need to use a byte[] as a key in a Dictionary. Since byte[] doesn\'t override the default GetHashCode method, two sepa

7条回答
  •  被撕碎了的回忆
    2020-12-28 12:56

    So, JaredPar's answer is not bad but it could be better in a few ways. First of all, the IEqualityComparer page says "We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface."

    Second, the implementation of GetHashCode is supposed to be fast. It's used to quickly eliminate obviously different objects, that would obviously be a waste of time to run Equals on. So GetHashCode should be much faster than actually running Equals.

    Third, returning the sum of the byte array as JaredPar has done, is very likely to produce collisions - if the bytes are in different order, or the relative differences cancel each other out, etc.

    So I would recommend a solution like this instead:

    public class ByteArrayComparer : EqualityComparer
    {
        public override bool Equals(byte[] first, byte[] second)
        {
            if (first == null || second == null) {
                // null == null returns true.
                // non-null == null returns false.
                return first == second;
            }
            if (ReferenceEquals(first, second)) {
                return true;
            }
            if (first.Length != second.Length) {
                return false;
            }
            // Linq extension method is based on IEnumerable, must evaluate every item.
            return first.SequenceEqual(second);
        }
        public override int GetHashCode(byte[] obj)
        {
            if (obj == null) {
                throw new ArgumentNullException("obj");
            }
            // quick and dirty, instantly identifies obviously different
            // arrays as being different
            return obj.Length;
        }
    }
    

    Above, returning obj.Length, is really quick and dirty, but also prone to return a lot of collisions. I think we can do better.

    If you're going to examine all the bytes, something like this is less collision prone than the simple sum of bytes as in JaredPar's answer. But again, this examines all the elements, so it's not going to perform better than actually running Equals. You might as well just return 0 unconditionally, and always force the use of Equals.

    I emphasize: this is better than returning the sum as in JaredPar's answer. And always returning 0 is better than this. And returning obj.Length is better than returning 0.

    // This is not recommended. Performance is too horrible.
    public override int GetHashCode(byte[] obj)
    {
        // Inspired by fletcher checksum. Not fletcher.
        if (obj == null) {
            throw new ArgumentNullException("obj");
        }
        int sum = 0;
        int sumOfSum = 0;
        foreach (var val in obj) {
            sum += val; // by default, addition is unchecked. does not throw OverflowException.
            sumOfSum += sum;
        }
        return sum ^ sumOfSum;
    }
    

    If you happen to know that the byte[] arrays you're using as the key were themselves cryptographic hashes, then you can utilize this assumption to your benefit, and simply return the first 4 bytes converted to an int. It probably works alright too, for general-purpose byte arrays:

    // This implementation works great if you assume the byte[] arrays
    // are themselves cryptographic hashes. It probably works alright too,
    // for general-purpose byte arrays.
    public override int GetHashCode(byte[] obj)
    {
        if (obj == null) {
            throw new ArgumentNullException("obj");
        }
        if (obj.Length >= 4) {
            return BitConverter.ToInt32(obj, 0);
        }
        // Length occupies at most 2 bits. Might as well store them in the high order byte
        int value = obj.Length;
        foreach (var b in obj) {
            value <<= 8;
            value += b;
        }
        return value;
    }
    

提交回复
热议问题