Scala default Set Implementation

前端 未结 1 488
南笙
南笙 2020-12-16 15:14

I can see that from Scala documentation scala.collection.immutable.Set is only a trait. Which one on the Set implementation is used by default ? HashSet or TreeSet (or somet

相关标签:
1条回答
  • 2020-12-16 16:17

    By looking at the source code, you can find that sets up to four elements have an optimized implementation provided by EmptySet, Set1, Set2, Set3 and Set4, which simply hold the single values.

    For example here's Set2 declaration (as of scala 2.11.4):

    class Set2[A] private[collection] (elem1: A, elem2: A) extends AbstractSet[A] with Set[A] with Serializable
    

    And here's the contains implementation:

    def contains(elem: A): Boolean =
      elem == elem1 || elem == elem2
    

    or the find implementation

    override def find(f: A => Boolean): Option[A] = {
      if (f(elem1)) Some(elem1)
      else if (f(elem2)) Some(elem2)
      else None
    }
    

    Very straightforward.

    For sets with more than 4 elements, the underlying implementation is an HashSet. We can easily verify this in the REPL:

    scala> Set(1, 2, 3, 4).getClass
    res1: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set4
    
    scala> Set(1, 2, 3, 4, 5, 6).getClass
    res0: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.HashSet$HashTrieSet
    

    That being said, find must always iterate over the whole HashSet, since it's unsorted, so it will be O(n). Conversely, a lookup operation like contains will be O(1) instead.

    Here's a more in-depth reference about performance of scala collections in general.

    Speaking of Map, pretty much the same concepts apply. There are optimized Map implementations up to 4 elements, and then it's an HashMap.

    0 讨论(0)
提交回复
热议问题