generic collection generation with a generic type

别来无恙 提交于 2021-01-28 04:24:45

问题


Sometimes, I find myself wishing scala collections to include some missing functionality, and it's rather easy "extending" a collection, and provide a custom method.

This is a bit more difficult when it comes to building the collection from scratch. Consider useful methods such as .iterate. I'll demonstrate the usecase with a similar, familiar function: unfold.

unfold is a method to construct a collection from an initial state z: S, and a function to generate an optional tuple of the next state, and an element E, or an empty option indicating the end.

the method signature, for some collection type Coll[T] should look roughly like:

def unfold[S,E](z: S)(f: S ⇒ Option[(S,E)]): Coll[E]

Now, IMO, the most "natural" usage should be, e.g:

val state: S = ??? // initial state
val arr: Array[E] = Array.unfold(state){ s ⇒
  // code to convert s to some Option[(S,E)]
  ???
}

This is pretty straight forward to do for a specific collection type:

implicit class ArrayOps(arrObj: Array.type) {
  def unfold[S,E : ClassTag](z: S)(f: S => Option[(S,E)]): Array[E] = {
    val b = Array.newBuilder[E]
    var s = f(z)
    while(s.isDefined) {
      val Some((state,element)) = s
      b += element
      s = f(state)
    }
    b.result()
  }
}

with this implicit class in scope, we can generate an array for the Fibonacci seq like this:

val arr: Array[Int] = Array.unfold(0->1) {
  case (a,b) if a < 256 => Some((b -> (a+b)) -> a)
  case _                => None
}

But if we want to provide this functionality to all other collection types, I see no other option than to C&P the code, and replace all Array occurrences with List,Seq,etc'...

So I tried another approach:

trait BuilderProvider[Elem,Coll] {
  def builder: mutable.Builder[Elem,Coll]
}

object BuilderProvider {
  object Implicits {
    implicit def arrayBuilderProvider[Elem : ClassTag] = new BuilderProvider[Elem,Array[Elem]] {
      def builder = Array.newBuilder[Elem]
    }
    implicit def listBuilderProvider[Elem : ClassTag] = new BuilderProvider[Elem,List[Elem]] {
      def builder = List.newBuilder[Elem]
    }
    // many more logicless implicits
  }
}

def unfold[Coll,S,E : ClassTag](z: S)(f: S => Option[(S,E)])(implicit bp: BuilderProvider[E,Coll]): Coll = {
  val b = bp.builder
  var s = f(z)
  while(s.isDefined) {
    val Some((state,element)) = s
    b += element
    s = f(state)
  }
  b.result()
}

Now, with the above in scope, all one needs is an import for the right type:

import BuilderProvider.Implicits.arrayBuilderProvider

val arr: Array[Int] = unfold(0->1) {
  case (a,b) if a < 256 => Some((b -> (a+b)) -> a)
  case _                => None
}

but this doesn't fell right also. I don't like forcing the user to import something, let alone an implicit method that will create a useless wiring class on every method call. Moreover, there is no easy way to override the default logic. You can think about collections such as Stream, where it would be most appropriate to create the collection lazily, or other special implementation details to consider regarding other collections.

The best solution I could come up with, was to use the first solution as a template, and generate the sources with sbt:

sourceGenerators in Compile += Def.task {
  val file = (sourceManaged in Compile).value / "myextensions" / "util" / "collections" / "package.scala"
  val colls = Seq("Array","List","Seq","Vector","Set") //etc'...
  val prefix = s"""package myextensions.util
    |
    |package object collections {
    |
    """.stripMargin
  val all = colls.map{ coll =>
    s"""
    |implicit class ${coll}Ops[Elem](obj: ${coll}.type) {
    |  def unfold[S,E : ClassTag](z: S)(f: S => Option[(S,E)]): ${coll}[E] = {
    |    val b = ${coll}.newBuilder[E]
    |    var s = f(z)
    |    while(s.isDefined) {
    |      val Some((state,element)) = s
    |      b += element
    |      s = f(state)
    |    }
    |    b.result()
    |  }
    |}
    """.stripMargin
  }
  IO.write(file,all.mkString(prefix,"\n","\n}\n"))
  Seq(file)
}.taskValue

But this solution suffers from other issues, and is hard to maintain. just imagine if unfold is not the only function to add globally, and overriding default implementation is still hard. bottom line, this is hard to maintain and does not "feel" right either.

So, is there a better way to achieve this?


回答1:


First, let's make a basic implementation of the function, which uses an explicit Builder argument. In case of unfold it can look like this:

import scala.language.higherKinds
import scala.annotation.tailrec
import scala.collection.GenTraversable
import scala.collection.mutable
import scala.collection.generic.{GenericCompanion, CanBuildFrom}

object UnfoldImpl {
  def unfold[CC[_], E, S](builder: mutable.Builder[E, CC[E]])(initial: S)(next: S => Option[(S, E)]): CC[E] = {
    @tailrec
    def build(state: S): CC[E] = {
      next(state) match {
        case None => builder.result()
        case Some((nextState, elem)) =>
          builder += elem
          build(nextState)
      }
    }

    build(initial)
  }
}

Now, what can be an easy way to get a builder of a collection by its type?

I can propose two possibile solutions. The first is to make an implicit extension class, that extends a GenericCompanion – the common superclass of most scala's built-in collections. This GenericCompanion has a method newBuilder that returns a Builder for the provided element type. An implementation may look like this:

implicit class Unfolder[CC[X] <: GenTraversable[X]](obj: GenericCompanion[CC]) {
  def unfold[S, E](initial: S)(next: S => Option[(S, E)]): CC[E] =
    UnfoldImpl.unfold(obj.newBuilder[E])(initial)(next)
}

And it's very easy to use this:

scala> List.unfold(1)(a => if (a > 10) None else Some(a + 1, a * a))
res1: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

One drawback is that some collections don't have companion objects extending GenericCompanion. For example, Array, or user-defined collections.

Another possible solution is to use an implicit 'builder provider', like you have proposed. And scala already has such a thing in the collection library. It's CanBuildFrom. An implementation with a CanBuildFrom may look like this:

object Unfolder2 {
  def apply[CC[_]] = new {
    def unfold[S, E](initial: S)(next: S => Option[(S, E)])(
      implicit cbf: CanBuildFrom[CC[E], E, CC[E]]
    ): CC[E] =
      UnfoldImpl.unfold(cbf())(initial)(next)
  }
}

Usage example:

scala> Unfolder2[Array].unfold(1)(a => if (a > 10) None else Some(a + 1, a * a))
res1: Array[Int] = Array(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

This works with scala's collections, Array, and may work with user-defined collections, if the user has provided a CanBuildFrom instance.


Note, that both approaches won't work with Streams in a lazy fashion. That's mostly because the original implementation UnfoldImpl.unfold uses a Builder, which for a Stream is eager.

To do something like unfolding for Stream lazily, you can't use the standard Builder. You'd have to provide a separate implementation using Stream.cons (or #::). To be able to choose an implementation automatically, depending on the collection type requested by user, you can use the typeclass pattern. Here is a sample implementation:

trait Unfolder3[E, CC[_]] {
  def unfold[S](initial: S)(next: S => Option[(S, E)]): CC[E]
}

trait UnfolderCbfInstance {
  // provides unfolder for types that have a `CanBuildFrom`
  // this is used only if the collection is not a `Stream`
  implicit def unfolderWithCBF[E, CC[_]](
    implicit cbf: CanBuildFrom[CC[E], E, CC[E]]
  ): Unfolder3[E, CC] =
    new Unfolder3[E, CC] {
      def unfold[S](initial: S)(next: S => Option[(S, E)]): CC[E] =
        UnfoldImpl.unfold(cbf())(initial)(next)
    }
}

object Unfolder3 extends UnfolderCbfInstance {
  // lazy implementation, that overrides `unfolderWithCbf` for `Stream`s
  implicit def streamUnfolder[E]: Unfolder3[E, Stream] =
    new Unfolder3[E, Stream] {
      def unfold[S](initial: S)(next: S => Option[(S, E)]): Stream[E] =
        next(initial).fold(Stream.empty[E]) {
          case (state, elem) =>
            elem #:: unfold(state)(next)
        }
    }

  def apply[CC[_]] = new {
    def unfold[E, S](initial: S)(next: S => Option[(S, E)])(
      implicit impl: Unfolder3[E, CC]
    ): CC[E] = impl.unfold(initial)(next)
  }
}

Now this implementation works eagerly for normal collections (including Array and user-defined collections with appropriate CanBuildFrom), and lazily for Streams:

scala> Unfolder3[Array].unfold(1)(a => if (a > 10) None else Some(a + 1, a * a))
res0: Array[Int] = Array(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

scala> com.Main.Unfolder3[Stream].unfold(1)(a => if (a > 10) None else { println(a); Some(a + 1, a * a) })
1
res2: Stream[Int] = Stream(1, ?)

scala> res2.take(3).toList
2
3
res3: List[Int] = List(1, 4, 9)

Note, that if Unfolder3.apply is moved to another object or class, the user won't have to import anything to do with Unfolder3 at all.

If you don't understand how this implementation works you can read something about the typeclass patern in Scala, and the order of implicit resolution.



来源:https://stackoverflow.com/questions/35682984/generic-collection-generation-with-a-generic-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!