问题
I have a HashSet that contains multiple lists of integers - i.e. HashSet<List<int>>
In order to maintain uniqueness I am currently having to do two things:
1. Manually loop though existing lists, looking for duplicates using SequenceEquals
.
2. Sorting the individual lists so that SequenceEquals
works currently.
Is there a better way to do this? Is there an existing IEqualityComparer that I can provide to the HashSet so that HashSet.Add()
can automatically handle uniqueness?
var hashSet = new HashSet<List<int>>();
for(/* some condition */)
{
List<int> list = new List<int>();
...
/* for eliminating duplicate lists */
list.Sort();
foreach(var set in hashSet)
{
if (list.SequenceEqual(set))
{
validPartition = false;
break;
}
}
if (validPartition)
newHashSet.Add(list);
}
回答1:
Here is a possible comparer that compares an IEnumerable<T>
by its elements. You still need to sort manually before adding.
One could build the sorting into the comparer, but I don't think that's a wise choice. Adding a canonical form of the list seems wiser.
This code will only work in .net 4 since it takes advantage of generic variance. If you need earlier versions you need to either replace IEnumerable
with List
, or add a second generic parameter for the collection type.
class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
{
return seq1.SequenceEqual(seq2);
}
public int GetHashCode(IEnumerable<T> seq)
{
int hash=1234567;
foreach(T elem in seq)
hash=hash*37+elem.GetHashCode();
return hash;
}
}
void Main()
{
var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());
List<int> test=new int[]{1,3,2}.ToList();
test.Sort();
hashSet.Add(test);
List<int> test2=new int[]{3,2,1}.ToList();
test2.Sort();
hashSet.Contains(test2).Dump();
}
回答2:
This starts off wrong, it has to be a HashSet<ReadOnlyCollection<>>
because you cannot allow the lists to change and invalidate the set predicate. This then allows you to calculate a hash code in O(n) when you add the collection to the set. And an O(n) test to check if it is already in the set with a very uncommon O(n^2) worst case if all the hashes turn out to be equal. Store the computed hash with the collection.
回答3:
Is there a reason you aren't just using an array? int[]
will perform better. Also I assume the lists contain duplicates, otherwise you'd just be using sets and not have a problem.
It appears that their contents won't change (much) once they've been added to the HashSet
. At the end of the day, you are going to have to use a comparer that falls back on SequenceEqual
. But you don't have to do it every single time. Instead or doing an exponential number of sequence compares (e.g. -- as the hashset grows, doing a SequenceEqual
against each existing member) -- if you create a good hashcode up front, you may have to do very few such compares. While the overhead of generating a good hashcode is probably about the same as doing a SequenceEqual
you're only doing it a single time for each list.
So, the first time you operate on a particular List<int>
, you should generate a hash based on the ordered sequence of numbers and cache it. Then the next time the list is compared, the cached value can be used. I'm not sure how you might do this with a comparer off the top of my head (maybe a static dictionary?) -- but you could implement List
wrapper that does this easily.
Here's a basic idea. You'd need to be careful to ensure that it isn't brittle (e.g. make sure you void any cached hash code when members change) but it doesn't look like that's going to be a typical situation for the way you're using this.
public class FasterComparingList<T>: IList<T>, IList, ...
/// whatever you need to implement
{
// Implement your interfaces against InnerList
// Any methods that change members of the list need to
// set _LongHash=null to force it to be regenerated
public List<T> InnerList { ... lazy load a List }
public int GetHashCode()
{
if (_LongHash==null) {
_LongHash=GetLongHash();
}
return (int)_LongHash;
}
private int? _LongHash=null;
public bool Equals(FasterComparingList<T> list)
{
if (InnerList.Count==list.Count) {
return true;
}
// you could also cache the sorted state and skip this if a list hasn't
// changed since the last sort
// not sure if native `List` does
list.Sort();
InnerList.Sort();
return InnerList.SequenceEqual(list);
}
protected int GetLongHash()
{
return .....
// something to create a reasonably good hash code -- which depends on the
// data. Adding all the numbers is probably fine, even if it fails a couple
// percent of the time you're still orders of magnitude ahead of sequence
// compare each time
}
}
If the lists won't change once added, this should be very fast. Even in situations where the lists could change frequently, the time to create a new hash code is not likely very different (if even greater at all) than doing a sequence compare.
回答4:
If you don't specify an IEQualityComparer, then the types default will be used, so I think what you'll need to do is create your own implementation of IEQualityComparer, and pass that to the constructor of your HashSet. Here is a good example.
回答5:
When comparing hashsets of lists one option you always have is that instead of comparing each element, you sort lists and join them using a comma and compare generated strings.
So, in this case, when you create custom comparer instead of iterating over elements and calculating custom hash function, you can apply this logic.
来源:https://stackoverflow.com/questions/5517932/how-to-create-a-hashsetlistint-with-distinct-elements