Create Hash Value on a List?

前端未结

关注

 2  1163

I have a List with 50 instances in it. Each of the instances has 1 or 2 unique properties, but in a way they are all unique because there is

相关标签:

2条回答

伪装坚强ぢ

2021-01-07 23:05
Does the hash have to be representative of the list's contents? In other words will you use the hash to determine potential equality? If not then just create a new Guid and use that.

If the identifier does need to represent the contents of the list then you can either generate a hashcode based on the contents of the list (this will be inefficient as you will be unable to cache this value as the list's contents may change) or forgo the hash altogether and use Enumerable.SequenceEquals to determine equality.

Here is an example of how I would implement getting a hash code for a List<T>. First of all, if you are going to get a hash code for a particular object your really ought to make sure that object will not change. If that object does change then your hash code is no longer any good.

The best way to work with a list that can be "frozen" (meaning no items added or removed after a certain point) is to call AsReadOnly. This will give you a ReadOnlyCollection<T>. The implementation below hinges on a ReadOnlyCollection<T> just to be safe so keep that in mind:
```
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;

class Example
{
    static void Main()
    {
        var seqOne = new List<int> { 1, 2, 3, 4, 5, 6 };
        var seqTwo = new List<int> { 6, 5, 4, 3, 2, 1 };

        var seqOneCode = seqOne.AsReadOnly().GetSequenceHashCode();
        var seqTwoCode = seqTwo.AsReadOnly().GetSequenceHashCode();

        Console.WriteLine(seqOneCode == seqTwoCode);
    }
}

static class Extensions
{
    public static int GetSequenceHashCode<T>(this ReadOnlyCollection<T> sequence)
    {
        return sequence
            .Select(item => item.GetHashCode())
            .Aggregate((total, nextCode) => total ^ nextCode);
    }
}
```
Oh, one last thing - make sure that your MyRichObject type has a good GetHashCode implementation itself otherwise your hash code for the list will potentially yield a lot of false positives upon comparison.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2021-01-07 23:11
TL;DR
```
public static int GetSequenceHashCode<T>(this IList<T> sequence)
{
    const int seed = 487;
    const int modifier = 31;

    unchecked
    {
        return sequence.Aggregate(seed, (current, item) =>
            (current*modifier) + item.GetHashCode());
    }            
}
```
Why bother with another answer?

The accepted answer can give dangerously inaccurate results if you have multiple items in the list with the same hash code. For example consider these inputs:
```
var a = new []{ "foo" };
var b = new []{ "foo", "bar" };
var c = new []{ "foo", "bar", "spam" };
var d = new []{ "seenoevil", "hearnoevil", "speaknoevil" };
```
These all produce different results suggesting they are all unique collections. Great! Now let's try with a duplicate:
```
var e = new []{ "foo", "bar", "spam" };
```
GetSequenceHashCode should produce the same result for both c and e - and it does. So far so good. Now let's try with items out of sequence:
```
var f = new []{ "spam", "bar", "foo" };
```
Uh oh... GetSequenceHashCode indicates that f is equal to both c and e which it is not. Why is this happening? Break it down into the actual hash code values first, using c as an example:
```
int hashC = "foo".GetHashCode() ^ 
            "bar".GetHashCode() ^ 
            "spam".GetHashCode();
```
Since the exact numbers here aren't really important and for the sake of clearer demonstration let's pretend the hash codes of the three strings are foo=8, bar=16 and spam=32. So:
```
int hashC = 8 ^ 16 ^ 32;
```
or to break it down into binary representation:
```
8 ^ 16 ^ 32 == 56;

//  8 = 00001000
//  ^
// 16 = 00010000
//  ^
// 32 = 00100000
//  =
// 56   00111000
```
Now you should see why the order of items in the list is overlooked by this implementation, i.e. 8^16^32 = 16^8^32 = 32^16^8 etc.

Secondly there's an issue with duplicates. Even if you assume that having the same contents in a different sequence is OK (which is not an approach I would encourage), I don't think anyone will argue the below behaviour is desirable. Let's try variations with duplicates within each list.
```
var a = new []{ "foo", "bar", "spam" };
var b = new []{ "foo", "bar", "spam", "foo" };
var c = new []{ "foo", "bar", "spam", "foo", "foo" };
var d = new []{ "foo", "bar", "spam", "foo", "foo", "spam", "foo", "spam", "foo" };
```
While a and b generate different seqeuence hashes, GetSequenceHashCode suggests that a, c and d are all the same. Why?

If you XOR a number with itself you essentially cancel it out, i.e.
```
8 ^ 8 == 0;

//  8 = 00001000
//  ^
//  8 = 00001000
//  =
//  0 = 00000000
```
XOR by the same number again gives you the original result, i.e.
```
8 ^ 8 ^ 8 == 8;

//  8 = 00001000
//  ^
//  8 = 00001000
//  ^
//  8 = 00001000
//  =
//  8 = 00001000
```
So if we look at a and c again, substituting the simplified hash codes:
```
var a = new []{ 8, 16, 32 };
var c = new []{ 8, 16, 32, 8, 8 };
```
the hash codes are caclulated as:
```
int hashA = 8 ^ 16 ^ 32;         // = 56
int hashC = 8 ^ 16 ^ 32 ^ 8 ^ 8; // = 56
                       // ↑   ↑ 
                       // these two cancel each other out
```
and likewise with d where each pair of foo and spam cancels itself out.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Create Hash Value on a List?

TL;DR

Why bother with another answer?