I am exploring the HashSet
type, but I don\'t understand where it stands in collections.
Can one use it to replace a List
Performance would be a bad reason to choose HashSet over List. Instead, what better captures your intent? If order is important, then Set (or HashSet) is out. If duplicates are permitted, likewise. But there are plenty of circumstances when we don't care about order, and we'd rather not have duplicates - and that's when you want a Set.
The important thing about HashSet<T>
is right there in the name: it's a set. The only things you can do with a single set is to establish what its members are, and to check whether an item is a member.
Asking if you can retrieve a single element (e.g. set[45]
) is misunderstanding the concept of the set. There's no such thing as the 45th element of a set. Items in a set have no ordering. The sets {1, 2, 3} and {2, 3, 1} are identical in every respect because they have the same membership, and membership is all that matters.
It's somewhat dangerous to iterate over a HashSet<T>
because doing so imposes an order on the items in the set. That order is not really a property of the set. You should not rely on it. If ordering of the items in a collection is important to you, that collection isn't a set.
Sets are really limited and with unique members. On the other hand, they're really fast.
HashSet<T>
is a data strucutre in the .NET framework that is a capable of representing a mathematical set as an object. In this case, it uses hash codes (the GetHashCode
result of each item) to compare equality of set elements.
A set differs from a list in that it only allows one occurrence of the same element contained within it. HashSet<T>
will just return false
if you try to add a second identical element. Indeed, lookup of elements is very quick (O(1)
time), since the internal data structure is simply a hashtable.
If you're wondering which to use, note that using a List<T>
where HashSet<T>
is appropiate is not the biggest mistake, though it may potentially allow problems where you have undesirable duplicate items in your collection. What is more, lookup (item retrieval) is vastly more efficient - ideally O(1)
(for perfect bucketing) instead of O(n)
time - which is quite important in many scenarios.
HashSet would be used to remove duplicate elements in an IEnumerable collection. For example,
List<string> duplicatedEnumrableStrings = new List<string> {"abc", "ghjr", "abc", "abc", "yre", "obm", "ghir", "qwrt", "abc", "vyeu"};
HashSet<string> uniqueStrings = new HashSet(duplicatedEnumrableStrings);
after those codes are run, uniqueStrings holds {"abc", "ghjr", "yre", "obm", "qwrt", "vyeu"};
List<T>
is used to store ordered sets of information. If you know the relative order of the elements of the list, you can access them in constant time. However, to determine where an element lies in the list or to check if it exists in the list, the lookup time is linear. On the other hand, HashedSet<T>
makes no guarantees of the order of the stored data and consequently provides constant access time for its elements.
As the name implies, HashedSet<T>
is a data structure that implements set semantics. The data structure is optimized to implement set operations (i.e. Union, Difference, Intersect), which can not be done as efficiently with the traditional List implementation.
So, to choose which data type to use really depends on what your are attempting to do with your application. If you don't care about how your elements are ordered in a collection, and only want to enumarate or check for existence, use HashSet<T>
. Otherwise, consider using List<T>
or another suitable data structure.