Scala GroupBy preserving insertion order?

后端 未结 4 1382
囚心锁ツ 2020-12-29 21:06

The groupBy method in Lists, Maps, etc., generate a Map after the function.

Is there a way to use the groupBy to generate a Map that preserves insertion order (Link

  • 2020-12-29 21:36

    This yields better results on ScalaMeter though the solution is very similar to the actual scala groupBy

        ::Benchmark Range.GroupBy::
        cores: 8
        hostname: xxxxx-MacBook-Pro.local
        name: Java HotSpot(TM) 64-Bit Server VM
        osArch: x86_64
        osName: Mac OS X
        vendor: Oracle Corporation
        version: 25.131-b11
        Parameters(size -> 300000): 6.500884
        Parameters(size -> 600000): 13.019679
        Parameters(size -> 900000): 22.756615
        Parameters(size -> 1200000): 25.481007
        Parameters(size -> 1500000): 33.129888

    compared to the one that zipWithIndex approach which yields

        :Benchmark Range.GroupBy::
        cores: 8
        hostname: xxxxx-MacBook-Pro.local
        name: Java HotSpot(TM) 64-Bit Server VM
        osArch: x86_64
        osName: Mac OS X
        vendor: Oracle Corporation
        version: 25.131-b11
        Parameters(size -> 300000): 9.57414
        Parameters(size -> 600000): 18.569085
        Parameters(size -> 900000): 28.233822
        Parameters(size -> 1200000): 36.975254
        Parameters(size -> 1500000): 47.447057


    implicit class GroupBy[A](val t: TraversableOnce[A]) {
      def sortedGroupBy[K](f: A => K)(implicit ordering: Ordering[K]): immutable.SortedMap[K, ArrayBuffer[A]] = {
        val m = mutable.SortedMap.empty[K, ArrayBuffer[A]]
        for (elem <- t) {
          val key = f(elem)
          val bldr = m.getOrElseUpdate(key, mutable.ArrayBuffer[A]())
          bldr += elem
        val b = immutable.SortedMap.newBuilder[K, ArrayBuffer[A]]
        for ((k, v) <- m) {
          b += ((k, v.result))

    example: val sizes = Gen.range("size")(300000, 1500000, 300000) and groupByOrdered(_ % 10)

    0 讨论(0)
  • 2020-12-29 21:46

    Here's one without maps:

    def orderedGroupBy[T, P](seq: Traversable[T])(f: T => P): Seq[(P, Traversable[T])] = {
       def accumulator(seq: Traversable[T], f: T => P, res: List[(P, Traversable[T])]): Seq[(P, Traversable[T])] = seq.headOption match {
         case None => res.reverse
         case Some(h) => {
           val key = f(h)
           val subseq = seq.takeWhile(f(_) == key)
           accumulator(seq.drop(subseq.size), f, (key -> subseq) :: res)
       accumulator(seq, f, Nil)

    It could be useful if you only need to access the results sequentially (no random access) and you want to avoid the overhead of creating and using Map objects. Note: I didn't compare the performance against the other options, it could actually be worse.

    EDIT: Just to be clear; this assumes your input is already ordered by the group key. My use case is a SELECT ... ORDER BY.

    0 讨论(0)
  • 2020-12-29 21:55

    The following would give you a groupByOrderedUnique method that behaves as you sought. It also adds a groupByOrdered that preserves duplicates as others have asked for in the comments.

    import collection.immutable.ListSet
    import collection.mutable.{LinkedHashMap => MMap, Builder}
    implicit class GroupByOrderedImplicitImpl[A](val t: Traversable[A]) extends AnyVal {
      def groupByOrderedUnique[K](f: A => K): Map[K, ListSet[A]] =
      def groupByOrdered[K](f: A => K): Map[K, List[A]] =
      def groupByGen[K, C[_]](makeBuilder: => Builder[A, C[A]])(f: A => K): Map[K, C[A]] = {
        val map = MMap[K, Builder[A, C[A]]]()
        for (i <- t) {
          val key = f(i)
          val builder = map.get(key) match {
            case Some(existing) => existing
            case None =>
              val newBuilder = makeBuilder
              map(key) = newBuilder
          builder += i

    When I use that code like:

    import GroupByOrderedImplicit._
    val range = 0.until(40)
    val in = range ++ range.reverse
    println("With dupes:")
    in.groupByOrdered(_ % 10).toList.sortBy(_._1).foreach(println)
    in.groupByOrderedUnique(_ % 10).toList.sortBy(_._1).foreach(println)

    I get the following output:

    With dupes:
    (0,List(0, 10, 20, 30, 30, 20, 10, 0))
    (1,List(1, 11, 21, 31, 31, 21, 11, 1))
    (2,List(2, 12, 22, 32, 32, 22, 12, 2))
    (3,List(3, 13, 23, 33, 33, 23, 13, 3))
    (4,List(4, 14, 24, 34, 34, 24, 14, 4))
    (5,List(5, 15, 25, 35, 35, 25, 15, 5))
    (6,List(6, 16, 26, 36, 36, 26, 16, 6))
    (7,List(7, 17, 27, 37, 37, 27, 17, 7))
    (8,List(8, 18, 28, 38, 38, 28, 18, 8))
    (9,List(9, 19, 29, 39, 39, 29, 19, 9))
    (0,ListSet(0, 10, 20, 30))
    (1,ListSet(1, 11, 21, 31))
    (2,ListSet(2, 12, 22, 32))
    (3,ListSet(3, 13, 23, 33))
    (4,ListSet(4, 14, 24, 34))
    (5,ListSet(5, 15, 25, 35))
    (6,ListSet(6, 16, 26, 36))
    (7,ListSet(7, 17, 27, 37))
    (8,ListSet(8, 18, 28, 38))
    (9,ListSet(9, 19, 29, 39))
    0 讨论(0)
  • 2020-12-29 21:57

    groupBy as defined on TraversableLike produces an immutable.Map, so you can't make this method produce something else.

    The order of the elements in each entry is already preserved, but not the order of the keys. The keys are the result of the function supplied, so they don't really have an order.

    If you wanted to make an order based on the first occurrence of a particular key, here's a sketch of how you might do it. Say we want to group integers by their value / 2:

    val m = List(4, 0, 5, 1, 2, 6, 3).zipWithIndex groupBy (_._1 / 2)
    val lhm = LinkedHashMap(m.toSeq sortBy (_._2.head._2): _*)
    lhm mapValues (_ map (_._1))
    // Map(2 -> List(4, 5), 0 -> List(0, 1), 1 -> List(2, 3), 3 -> List(6))
    // Note order of keys is same as first occurrence in original list
    0 讨论(0)