I can see that from Scala documentation scala.collection.immutable.Set is only a trait. Which one on the Set implementation is used by default ? HashSet or TreeSet (or somet
By looking at the source code, you can find that sets up to four elements have an optimized implementation provided by EmptySet
, Set1
, Set2
, Set3
and Set4
, which simply hold the single values.
For example here's Set2
declaration (as of scala 2.11.4):
class Set2[A] private[collection] (elem1: A, elem2: A) extends AbstractSet[A] with Set[A] with Serializable
And here's the contains
implementation:
def contains(elem: A): Boolean =
elem == elem1 || elem == elem2
or the find
implementation
override def find(f: A => Boolean): Option[A] = {
if (f(elem1)) Some(elem1)
else if (f(elem2)) Some(elem2)
else None
}
Very straightforward.
For sets with more than 4 elements, the underlying implementation is an HashSet
. We can easily verify this in the REPL:
scala> Set(1, 2, 3, 4).getClass
res1: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set4
scala> Set(1, 2, 3, 4, 5, 6).getClass
res0: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.HashSet$HashTrieSet
That being said, find
must always iterate over the whole HashSet
, since it's unsorted, so it will be O(n)
.
Conversely, a lookup operation like contains
will be O(1)
instead.
Here's a more in-depth reference about performance of scala collections in general.
Speaking of Map
, pretty much the same concepts apply. There are optimized Map
implementations up to 4 elements, and then it's an HashMap
.