Create Hash Value on a List?

前端 未结 2 1161
南旧
南旧 2021-01-07 22:44

I have a List with 50 instances in it. Each of the instances has 1 or 2 unique properties, but in a way they are all unique because there is

相关标签:
2条回答
  • 2021-01-07 23:05

    Does the hash have to be representative of the list's contents? In other words will you use the hash to determine potential equality? If not then just create a new Guid and use that.

    If the identifier does need to represent the contents of the list then you can either generate a hashcode based on the contents of the list (this will be inefficient as you will be unable to cache this value as the list's contents may change) or forgo the hash altogether and use Enumerable.SequenceEquals to determine equality.


    Here is an example of how I would implement getting a hash code for a List<T>. First of all, if you are going to get a hash code for a particular object your really ought to make sure that object will not change. If that object does change then your hash code is no longer any good.

    The best way to work with a list that can be "frozen" (meaning no items added or removed after a certain point) is to call AsReadOnly. This will give you a ReadOnlyCollection<T>. The implementation below hinges on a ReadOnlyCollection<T> just to be safe so keep that in mind:

    using System;
    using System.Collections.Generic;
    using System.Collections.ObjectModel;
    using System.Linq;
    
    class Example
    {
        static void Main()
        {
            var seqOne = new List<int> { 1, 2, 3, 4, 5, 6 };
            var seqTwo = new List<int> { 6, 5, 4, 3, 2, 1 };
    
            var seqOneCode = seqOne.AsReadOnly().GetSequenceHashCode();
            var seqTwoCode = seqTwo.AsReadOnly().GetSequenceHashCode();
    
            Console.WriteLine(seqOneCode == seqTwoCode);
        }
    }
    
    static class Extensions
    {
        public static int GetSequenceHashCode<T>(this ReadOnlyCollection<T> sequence)
        {
            return sequence
                .Select(item => item.GetHashCode())
                .Aggregate((total, nextCode) => total ^ nextCode);
        }
    }
    

    Oh, one last thing - make sure that your MyRichObject type has a good GetHashCode implementation itself otherwise your hash code for the list will potentially yield a lot of false positives upon comparison.

    0 讨论(0)
  • 2021-01-07 23:11

    TL;DR

    public static int GetSequenceHashCode<T>(this IList<T> sequence)
    {
        const int seed = 487;
        const int modifier = 31;
    
        unchecked
        {
            return sequence.Aggregate(seed, (current, item) =>
                (current*modifier) + item.GetHashCode());
        }            
    }
    

    Why bother with another answer?

    The accepted answer can give dangerously inaccurate results if you have multiple items in the list with the same hash code. For example consider these inputs:

    var a = new []{ "foo" };
    var b = new []{ "foo", "bar" };
    var c = new []{ "foo", "bar", "spam" };
    var d = new []{ "seenoevil", "hearnoevil", "speaknoevil" };
    

    These all produce different results suggesting they are all unique collections. Great! Now let's try with a duplicate:

    var e = new []{ "foo", "bar", "spam" };
    

    GetSequenceHashCode should produce the same result for both c and e - and it does. So far so good. Now let's try with items out of sequence:

    var f = new []{ "spam", "bar", "foo" };
    

    Uh oh... GetSequenceHashCode indicates that f is equal to both c and e which it is not. Why is this happening? Break it down into the actual hash code values first, using c as an example:

    int hashC = "foo".GetHashCode() ^ 
                "bar".GetHashCode() ^ 
                "spam".GetHashCode();
    

    Since the exact numbers here aren't really important and for the sake of clearer demonstration let's pretend the hash codes of the three strings are foo=8, bar=16 and spam=32. So:

    int hashC = 8 ^ 16 ^ 32;
    

    or to break it down into binary representation:

    8 ^ 16 ^ 32 == 56;
    
    //  8 = 00001000
    //  ^
    // 16 = 00010000
    //  ^
    // 32 = 00100000
    //  =
    // 56   00111000
    

    Now you should see why the order of items in the list is overlooked by this implementation, i.e. 8^16^32 = 16^8^32 = 32^16^8 etc.

    Secondly there's an issue with duplicates. Even if you assume that having the same contents in a different sequence is OK (which is not an approach I would encourage), I don't think anyone will argue the below behaviour is desirable. Let's try variations with duplicates within each list.

    var a = new []{ "foo", "bar", "spam" };
    var b = new []{ "foo", "bar", "spam", "foo" };
    var c = new []{ "foo", "bar", "spam", "foo", "foo" };
    var d = new []{ "foo", "bar", "spam", "foo", "foo", "spam", "foo", "spam", "foo" };
    

    While a and b generate different seqeuence hashes, GetSequenceHashCode suggests that a, c and d are all the same. Why?

    If you XOR a number with itself you essentially cancel it out, i.e.

    8 ^ 8 == 0;
    
    //  8 = 00001000
    //  ^
    //  8 = 00001000
    //  =
    //  0 = 00000000
    

    XOR by the same number again gives you the original result, i.e.

    8 ^ 8 ^ 8 == 8;
    
    //  8 = 00001000
    //  ^
    //  8 = 00001000
    //  ^
    //  8 = 00001000
    //  =
    //  8 = 00001000
    

    So if we look at a and c again, substituting the simplified hash codes:

    var a = new []{ 8, 16, 32 };
    var c = new []{ 8, 16, 32, 8, 8 };
    

    the hash codes are caclulated as:

    int hashA = 8 ^ 16 ^ 32;         // = 56
    int hashC = 8 ^ 16 ^ 32 ^ 8 ^ 8; // = 56
                           // ↑   ↑ 
                           // these two cancel each other out
    

    and likewise with d where each pair of foo and spam cancels itself out.

    0 讨论(0)
提交回复
热议问题