Use byte[] as key in dictionary

前端 未结 7 1564
南旧
南旧 2020-12-28 12:11

I need to use a byte[] as a key in a Dictionary. Since byte[] doesn\'t override the default GetHashCode method, two sepa

相关标签:
7条回答
  • 2020-12-28 12:42

    Your thought was my first thought as well. I don't think that it would be error prone. But if you don't like that option you could create a class that implements IEqualityComparer and pass an instance of it to the Dictionary's constructor.

    0 讨论(0)
  • 2020-12-28 12:45

    When you are retrieving the items from the Dictionary you are using new operator for the byte[]. This will look for a different (new) byte[] instance in the Dictionary which is not present.

    Here is a solution that will work:

     var dict = new Dictionary<byte[], string>();
    
                var b = new byte[] { 1,2,3};
    
                dict[b] = "my string";
    
                var value = dict[b]; 
    
                Console.WriteLine(value);
    
    0 讨论(0)
  • 2020-12-28 12:49

    By default byte[] will be compared by reference which is not what you want in this case. What you need to do is specify a custom IEqualityComparer<byte[]> and do the comparison you want.

    For example

    public class ByteArrayComparer : IEqualityComparer<byte[]> {
      public bool Equals(byte[] left, byte[] right) {
        if ( left == null || right == null ) {
          return left == right;
        }
        return left.SequenceEqual(right);
      }
      public int GetHashCode(byte[] key) {
        if (key == null)
          throw new ArgumentNullException("key");
        return key.Sum(b => b);
      }
    }
    

    Then you can do

    var dict = new Dictionary<byte[], string>(new ByteArrayComparer());
    

    Solution for 2.0

    public class ByteArrayComparer : IEqualityComparer<byte[]> {
      public bool Equals(byte[] left, byte[] right) {
        if ( left == null || right == null ) {
          return left == right;
        }
        if ( left.Length != right.Length ) {
          return false;
        }
        for ( int i= 0; i < left.Length; i++) {
          if ( left[i] != right[i] ) {
            return false;
          }
        }
        return true;
      }
      public int GetHashCode(byte[] key) {
        if (key == null)
          throw new ArgumentNullException("key");
        int sum = 0;
        foreach ( byte cur in key ) {
          sum += cur;
        }
        return sum;
      }
    }
    
    0 讨论(0)
  • 2020-12-28 12:56

    So, JaredPar's answer is not bad but it could be better in a few ways. First of all, the IEqualityComparer page says "We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface."

    Second, the implementation of GetHashCode is supposed to be fast. It's used to quickly eliminate obviously different objects, that would obviously be a waste of time to run Equals on. So GetHashCode should be much faster than actually running Equals.

    Third, returning the sum of the byte array as JaredPar has done, is very likely to produce collisions - if the bytes are in different order, or the relative differences cancel each other out, etc.

    So I would recommend a solution like this instead:

    public class ByteArrayComparer : EqualityComparer<byte[]>
    {
        public override bool Equals(byte[] first, byte[] second)
        {
            if (first == null || second == null) {
                // null == null returns true.
                // non-null == null returns false.
                return first == second;
            }
            if (ReferenceEquals(first, second)) {
                return true;
            }
            if (first.Length != second.Length) {
                return false;
            }
            // Linq extension method is based on IEnumerable, must evaluate every item.
            return first.SequenceEqual(second);
        }
        public override int GetHashCode(byte[] obj)
        {
            if (obj == null) {
                throw new ArgumentNullException("obj");
            }
            // quick and dirty, instantly identifies obviously different
            // arrays as being different
            return obj.Length;
        }
    }
    

    Above, returning obj.Length, is really quick and dirty, but also prone to return a lot of collisions. I think we can do better.

    If you're going to examine all the bytes, something like this is less collision prone than the simple sum of bytes as in JaredPar's answer. But again, this examines all the elements, so it's not going to perform better than actually running Equals. You might as well just return 0 unconditionally, and always force the use of Equals.

    I emphasize: this is better than returning the sum as in JaredPar's answer. And always returning 0 is better than this. And returning obj.Length is better than returning 0.

    // This is not recommended. Performance is too horrible.
    public override int GetHashCode(byte[] obj)
    {
        // Inspired by fletcher checksum. Not fletcher.
        if (obj == null) {
            throw new ArgumentNullException("obj");
        }
        int sum = 0;
        int sumOfSum = 0;
        foreach (var val in obj) {
            sum += val; // by default, addition is unchecked. does not throw OverflowException.
            sumOfSum += sum;
        }
        return sum ^ sumOfSum;
    }
    

    If you happen to know that the byte[] arrays you're using as the key were themselves cryptographic hashes, then you can utilize this assumption to your benefit, and simply return the first 4 bytes converted to an int. It probably works alright too, for general-purpose byte arrays:

    // This implementation works great if you assume the byte[] arrays
    // are themselves cryptographic hashes. It probably works alright too,
    // for general-purpose byte arrays.
    public override int GetHashCode(byte[] obj)
    {
        if (obj == null) {
            throw new ArgumentNullException("obj");
        }
        if (obj.Length >= 4) {
            return BitConverter.ToInt32(obj, 0);
        }
        // Length occupies at most 2 bits. Might as well store them in the high order byte
        int value = obj.Length;
        foreach (var b in obj) {
            value <<= 8;
            value += b;
        }
        return value;
    }
    
    0 讨论(0)
  • 2020-12-28 12:56

    Could you convert the byte[] to a string and use that as the key?

    Something like:

            ASCIIEncoding enc = new ASCIIEncoding();
            byte[] input;
            string demo = new string(enc.GetChars(input));
            byte[] decode = enc.GetBytes(demo.ToCharArray());
    
    0 讨论(0)
  • 2020-12-28 12:59
    using System;
    using System.Collections;
    using System.Collections.Generic;
    
    [Serializable]
    class StructuralEqualityComparer : IEqualityComparer, IEqualityComparer<object>
    {
        public new bool Equals(object x, object y)
        {
            var s = x as IStructuralEquatable;
            return s == null ? object.Equals(x, y) : s.Equals(y, this);
        }
    
        public int GetHashCode(object obj)
        {
            var s = obj as IStructuralEquatable;
            return s == null ? EqualityComparer<object>.Default.GetHashCode(obj) : s.GetHashCode(this);
        }
    }
    
    0 讨论(0)
提交回复
热议问题