Any idea on how to check whether that list is a subset of another?
Specifically, I have
List t1 = new List { 1, 3, 5 };
L
@Cameron's solution as an extension method:
public static bool IsSubsetOf<T>(this IEnumerable<T> a, IEnumerable<T> b)
{
return !a.Except(b).Any();
}
Usage:
bool isSubset = t2.IsSubsetOf(t1);
(This is similar, but not quite the same as the one posted on @Michael's blog)
Given then we have 2 arrays, array1
and array2
and we want to check that all array1
values exists in array2
:
array1.All(ar1 => array2.Any(ar2 => ar2.Equals(ar1)));
Note: ar2.Equals(ar1)
can be replaced if equality is described in other way.
If you are unit-testing you can also utilize the CollectionAssert.IsSubsetOf method :
CollectionAssert.IsSubsetOf(subset, superset);
In the above case this would mean:
CollectionAssert.IsSubsetOf(t2, t1);
Try this
static bool IsSubSet<A>(A[] set, A[] toCheck) {
return set.Length == (toCheck.Intersect(set)).Count();
}
The idea here is that Intersect will only return the values that are in both Arrays. At this point if the length of the resulting set is the same as the original set, then all elements in "set" are also in "check" and therefore "set" is a subset of "toCheck"
Note: My solution does not work if "set" has duplicates. I'm not changing it because I don't want to steal other people's votes.
Hint: I voted for Cameron's answer.
I have spent some time benchmarking 3 working solutions in this answer with BenchmarkDotNet.
!t2.Except(t1).Any()
called Except().Any()t2.IsSubsetOf(t1)
called HashSett2.All(elem => t1.Contains(elem))
called All(=>Contains)I have varied 3 params:
All supersets were random length and contained items in range [0; Variability)
. Subsets had fixed length and contained items in range [0; Variability)
.
The winner is All(=>Contains) method - it is sometimes up to 30x times faster than using HashSets and 50x faster than Except().Any().
Code can be found on github
UPDATE
As mentioned in comments by @Gert Arnold I had a bug in my benchmark. I called ToHashSet()
inside every iteration.
So i build also a collection of prebuilt hashsets and added a pure-hashset test named prebuilt HashSet
.
It is not an absolute winner and sometimes it looses All(=>Contains)
algorithm. It seems to me the point is in variability (and duplicates) in supersets. When variability in values is low - hashset wins, as it removes it from data.
The final table goes here:
| Method | SupersetsInIteration | SubsetLength | Variability | Mean | Error | StdDev | Median |
|------------------- |--------------------- |------------- |------------ |--------------:|-------------:|-------------:|--------------:|
| Except().Any() | 1000 | 10 | 100 | 1,485.89 μs | 7.363 μs | 6.887 μs | 1,488.36 μs |
| ToHashSet() | 1000 | 10 | 100 | 1,015.59 μs | 5.099 μs | 4.770 μs | 1,015.43 μs |
| prebuilt HashSet | 1000 | 10 | 100 | 38.76 μs | 0.065 μs | 0.054 μs | 38.78 μs |
| All(=>Contains) | 1000 | 10 | 100 | 105.46 μs | 0.320 μs | 0.267 μs | 105.38 μs |
| Except().Any() | 1000 | 10 | 10000 | 1,912.17 μs | 38.180 μs | 87.725 μs | 1,890.72 μs |
| ToHashSet() | 1000 | 10 | 10000 | 1,038.70 μs | 20.028 μs | 40.459 μs | 1,019.35 μs |
| prebuilt HashSet | 1000 | 10 | 10000 | 28.22 μs | 0.165 μs | 0.155 μs | 28.24 μs |
| All(=>Contains) | 1000 | 10 | 10000 | 81.47 μs | 0.117 μs | 0.109 μs | 81.45 μs |
| Except().Any() | 1000 | 50 | 100 | 4,888.22 μs | 81.268 μs | 76.019 μs | 4,854.42 μs |
| ToHashSet() | 1000 | 50 | 100 | 4,323.23 μs | 21.424 μs | 18.992 μs | 4,315.16 μs |
| prebuilt HashSet | 1000 | 50 | 100 | 186.53 μs | 1.257 μs | 1.176 μs | 186.35 μs |
| All(=>Contains) | 1000 | 50 | 100 | 1,173.37 μs | 2.667 μs | 2.227 μs | 1,173.08 μs |
| Except().Any() | 1000 | 50 | 10000 | 7,148.22 μs | 20.545 μs | 19.218 μs | 7,138.22 μs |
| ToHashSet() | 1000 | 50 | 10000 | 4,576.69 μs | 20.955 μs | 17.499 μs | 4,574.34 μs |
| prebuilt HashSet | 1000 | 50 | 10000 | 33.87 μs | 0.160 μs | 0.142 μs | 33.85 μs |
| All(=>Contains) | 1000 | 50 | 10000 | 131.34 μs | 0.569 μs | 0.475 μs | 131.24 μs |
| Except().Any() | 10000 | 10 | 100 | 14,798.42 μs | 120.423 μs | 112.643 μs | 14,775.43 μs |
| ToHashSet() | 10000 | 10 | 100 | 10,263.52 μs | 64.082 μs | 59.942 μs | 10,265.58 μs |
| prebuilt HashSet | 10000 | 10 | 100 | 1,241.19 μs | 4.248 μs | 3.973 μs | 1,241.75 μs |
| All(=>Contains) | 10000 | 10 | 100 | 1,058.41 μs | 6.766 μs | 6.329 μs | 1,059.22 μs |
| Except().Any() | 10000 | 10 | 10000 | 16,318.65 μs | 97.878 μs | 91.555 μs | 16,310.02 μs |
| ToHashSet() | 10000 | 10 | 10000 | 10,393.23 μs | 68.236 μs | 63.828 μs | 10,386.27 μs |
| prebuilt HashSet | 10000 | 10 | 10000 | 1,087.21 μs | 2.812 μs | 2.631 μs | 1,085.89 μs |
| All(=>Contains) | 10000 | 10 | 10000 | 847.88 μs | 1.536 μs | 1.436 μs | 847.34 μs |
| Except().Any() | 10000 | 50 | 100 | 48,257.76 μs | 232.573 μs | 181.578 μs | 48,236.31 μs |
| ToHashSet() | 10000 | 50 | 100 | 43,938.46 μs | 994.200 μs | 2,687.877 μs | 42,877.97 μs |
| prebuilt HashSet | 10000 | 50 | 100 | 4,634.98 μs | 16.757 μs | 15.675 μs | 4,643.17 μs |
| All(=>Contains) | 10000 | 50 | 100 | 10,256.62 μs | 26.440 μs | 24.732 μs | 10,243.34 μs |
| Except().Any() | 10000 | 50 | 10000 | 73,192.15 μs | 479.584 μs | 425.139 μs | 73,077.26 μs |
| ToHashSet() | 10000 | 50 | 10000 | 45,880.72 μs | 141.497 μs | 125.433 μs | 45,860.50 μs |
| prebuilt HashSet | 10000 | 50 | 10000 | 1,620.61 μs | 3.507 μs | 3.280 μs | 1,620.52 μs |
| All(=>Contains) | 10000 | 50 | 10000 | 1,460.01 μs | 1.819 μs | 1.702 μs | 1,459.49 μs |
| Except().Any() | 100000 | 10 | 100 | 149,047.91 μs | 1,696.388 μs | 1,586.803 μs | 149,063.20 μs |
| ToHashSet() | 100000 | 10 | 100 | 100,657.74 μs | 150.890 μs | 117.805 μs | 100,654.39 μs |
| prebuilt HashSet | 100000 | 10 | 100 | 12,753.33 μs | 17.257 μs | 15.298 μs | 12,749.85 μs |
| All(=>Contains) | 100000 | 10 | 100 | 11,238.79 μs | 54.228 μs | 50.725 μs | 11,247.03 μs |
| Except().Any() | 100000 | 10 | 10000 | 163,277.55 μs | 1,096.107 μs | 1,025.299 μs | 163,556.98 μs |
| ToHashSet() | 100000 | 10 | 10000 | 99,927.78 μs | 403.811 μs | 337.201 μs | 99,812.12 μs |
| prebuilt HashSet | 100000 | 10 | 10000 | 11,671.99 μs | 6.753 μs | 5.986 μs | 11,672.28 μs |
| All(=>Contains) | 100000 | 10 | 10000 | 8,217.51 μs | 67.959 μs | 56.749 μs | 8,225.85 μs |
| Except().Any() | 100000 | 50 | 100 | 493,925.76 μs | 2,169.048 μs | 1,922.805 μs | 493,386.70 μs |
| ToHashSet() | 100000 | 50 | 100 | 432,214.15 μs | 1,261.673 μs | 1,180.169 μs | 431,624.50 μs |
| prebuilt HashSet | 100000 | 50 | 100 | 49,593.29 μs | 75.300 μs | 66.751 μs | 49,598.45 μs |
| All(=>Contains) | 100000 | 50 | 100 | 98,662.71 μs | 119.057 μs | 111.366 μs | 98,656.00 μs |
| Except().Any() | 100000 | 50 | 10000 | 733,526.81 μs | 8,728.516 μs | 8,164.659 μs | 733,455.20 μs |
| ToHashSet() | 100000 | 50 | 10000 | 460,166.27 μs | 7,227.011 μs | 6,760.150 μs | 457,359.70 μs |
| prebuilt HashSet | 100000 | 50 | 10000 | 17,443.96 μs | 10.839 μs | 9.608 μs | 17,443.40 μs |
| All(=>Contains) | 100000 | 50 | 10000 | 14,222.31 μs | 47.090 μs | 44.048 μs | 14,217.94 μs |
This is a significantly more efficient solution than the others posted here, especially the top solution:
bool isSubset = t2.All(elem => t1.Contains(elem));
If you can find a single element in t2 that isn't in t1, then you know that t2 is not a subset of t1. The advantage of this method is that it is done all in-place, without allocating additional space, unlike the solutions using .Except or .Intersect. Furthermore, this solution is able to break as soon as it finds a single element that violates the subset condition, while the others continue searching. Below is the optimal long form of the solution, which is only marginally faster in my tests than the above shorthand solution.
bool isSubset = true;
foreach (var element in t2) {
if (!t1.Contains(element)) {
isSubset = false;
break;
}
}
I did some rudimentary performance analysis of all the solutions, and the results are drastic. These two solutions are about 100x faster than the .Except() and .Intersect() solutions, and use no additional memory.