Say a, b, c are all List
and I want to create an unsorted union of them. Although performance isn\'t super-critical, they might have 10,000 entries in each
Union
is O(n).
a.Union(b).Union(c)
is less efficient in most implementations than a.Union(b.Concat(c))
because it creates a hash-set for the first union operation and then another for the second, as other answers have said. Both of these also end up with a chain of IEnumerator<T>
objects in use which increases cost as further sources are added.
a.Union(b).Union(c)
is more efficient in .NET Core because the second .Union()
operation produces a single object with knowledge of a
, b
and c
and it will create a single hash-set for the entire operation, as well as avoiding the chain of IEnumerator<T>
objects.
While @Tim Schmelter is right about linear time complexity of the Enumerable.Union
method, chaining multiple Union
operators has the hidden overhead that every Union
operator internally creates a hash set which basically duplicates the one from the previous operator (plus additional items), thus using much more memory compared to single HashSet
approach.
If we take into account the fact that Union
is simply a shortcut for Concat
+ Distinct
, the scalable LINQ solution with the same time/space complexity of the HashSet
will be:
a.Concat(b).Concat(c)...Concat(x).Distinct()
You should use Enumerable.Union
because it is as efficient as the HashSet
approach. Complexity is O(n+m) because:
Enumerable.Union
When the object returned by this method is enumerated,
Union<TSource>
enumerates first and second in that order and yields each element that has not already been yielded.
Source-code here.
Ivan is right, there is an overhead if you use Enumerable.Union
with multiple collections since a new set must be created for every chained call. So it might be more efficient(in terms of memory consumption) if you use one of these approaches:
Concat
+ Distinct
:
a.Concat(b).Concat(c)...Concat(x).Distinct()
Union
+ Concat
a.Union(b.Concat(c)...Concat(x))
HashSet<T> constructor that takes IEnumerable<T>
(f.e. with int
):
new HashSet<int>(a.Concat(b).Concat(c)...Concat(x))
The difference between the first two might be negligible. The third approach is not using deferred execution, it creates a HashSet<>
in memory. It's a good and efficient way 1. if you need this collection type or 2. if this is the final operation on the query. But if you need to to further operations on this chained query you should prefer either Concat + Distinct
or Union + Concat
.