What is the preferred way to implement 'yield' in Scala?

前端 未结 3 1290
轻奢々
轻奢々 2021-01-31 19:23

I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose \'yield\' statement is extremely useful for i

3条回答
  •  佛祖请我去吃肉
    2021-01-31 20:10

    The implementation below provides a Python-like generator.

    Notice that there's a function called _yield in the code below, because yield is already a keyword in Scala, which by the way, does not have anything to do with yield you know from Python.

    import scala.annotation.tailrec
    import scala.collection.immutable.Stream
    import scala.util.continuations._
    
    object Generators {
      sealed trait Trampoline[+T]
    
      case object Done extends Trampoline[Nothing]
      case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]
    
      class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
        def next: T = {
          cont() match {
            case Continue(r, nextCont) => cont = nextCont; r
            case _ => sys.error("Generator exhausted")
          }
        }
    
        def hasNext = cont() != Done
      }
    
      type Gen[T] = cps[Trampoline[T]]
    
      def generator[T](body: => Unit @Gen[T]): Generator[T] = {
        new Generator((Unit) => reset { body; Done })
      }
    
      def _yield[T](t: T): Unit @Gen[T] =
        shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
    }
    
    
    object TestCase {
      import Generators._
    
      def sectors = generator {
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
      }
    
      def main(args: Array[String]): Unit = {
        for (s <- sectors) { println(s) }
      }
    }
    

    It works pretty well, including for the typical usage of for loops.

    Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. Then, we will see why it needs to be like so in Scala.

    If you are used to writing code in Python, you've probably used generators like this:

    // This is Scala code that does not compile :(
    // This code naively tries to mimic the way generators are used in Python
    
    def myGenerator = generator {
      val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
      list foreach {s => _yield(s)}
    }
    

    This code above does not compile. Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. I'm afraid this explanation is a complete failure. Let me try again:

    If you had coded something like shown below, it would compile fine:

    def myGenerator = generator {
      _yield("Financials")
      _yield("Materials")
      _yield("Technology")
      _yield("Utilities")
    }
    

    This code compiles because the generator can be decomposed in a sequence of yields and, in this case, a yield matches the type involved in the continuation. To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield. Just for the sake of clarification, we can think that the sequence of yields could be expressed like this:

    { some code here; _yield("Financials")
        { some other code here; _yield("Materials")
            { eventually even some more code here; _yield("Technology")
                { ok, fine, youve got the idea, right?; _yield("Utilities") }}}}
    

    Again, without going deep into convoluted theory, the point is that, after a yield you need to provide another block that ends with a yield, or close the chain otherwise. This is what we are doing in the pseudo-code above: after the yield we are opening another block which in turn ends with a yield followed by another yield which in turn ends with another yield, and so on. Obviously this thing must end at some point. Then the only thing we are allowed to do is closing the entire chain.

    OK. But... how we can yield multiple pieces of information? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield.

      def myGenerator = generator {
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        val list = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
      }
    

    Let's analyze what's going on here:

    1. Our generator function myGenerator contains some logic that obtains that generates information. In this example, we simply use a sequence of strings.

    2. Our generator function myGenerator calls a recursive function which is responsible for yield-ing multiple pieces of information, obtained from our sequence of strings.

    3. The recursive function must be declared before use, otherwise the compiler crashes.

    4. The recursive function tailrec provides the tail recursion we need.

    The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above.

    Notice that tailrec is just a convenient name we found, for the sake of clarification. In particular, tailrec does not need to be the last statement of our generator function; not necessarily. The only restriction is that you have to provide a sequence of blocks which match the type of an yield, like shown below:

      def myGenerator = generator {
    
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        _yield("Before the first call")
        _yield("OK... not yet...")
        _yield("Ready... steady... go")
    
        val list = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
    
        _yield("done")
        _yield("long life and prosperity")
      }
    

    One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances.

    Let's examine the example below. We have three generators: sectors, industries and companies. For brevity, only sectors is completely shown. This generator employs a tailrec function as demonstrated already above. The trick here is that the same tailrec function is also employed by other generators. All we have to do is supply a different body function.

    type GenP = (NodeSeq, NodeSeq, NodeSeq)
    type GenR = immutable.Map[String, String]
    
    def tailrec(p: GenP)(body: GenP => GenR): Unit @Gen[GenR] = {
      val (stats, rows, header)  = p
      if (!stats.isEmpty && !rows.isEmpty) {
        val heads: GenP = (stats.head, rows.head, header)
        val tails: GenP = (stats.tail, rows.tail, header)
        _yield(body(heads))
        // tail recursion
        tailrec(tails)(body)
      }
    }
    
    def sectors = generator[GenR] {
      def body(p: GenP): GenR = {
          // unpack arguments
          val stat, row, header = p
          // obtain name and url
          val name = (row \ "a").text
          val url  = (row \ "a" \ "@href").text
          // create map and populate fields: name and url
          var m = new scala.collection.mutable.HashMap[String, String]
          m.put("name", name)
          m.put("url",  url)
          // populate other fields
          (header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
          // returns a map
          m
      }
    
      val root  : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
      val header: scala.xml.NodeSeq = ... // code is omitted
      val stats : scala.xml.NodeSeq = ... // code is omitted
      val rows  : scala.xml.NodeSeq = ... // code is omitted
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    def industries(sector: String) = generator[GenR] {
      def body(p: GenP): GenR = {
          //++ similar to 'body' demonstrated in "sectors"
          // returns a map
          m
      }
    
      //++ obtain NodeSeq variables, like demonstrated in "sectors" 
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    def companies(sector: String) = generator[GenR] {
      def body(p: GenP): GenR = {
          //++ similar to 'body' demonstrated in "sectors"
          // returns a map
          m
      }
    
      //++ obtain NodeSeq variables, like demonstrated in "sectors" 
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    • Credits to Rich Dougherty and huynhjl.
      See this SO thread: Implementing yield (yield return) using Scala continuations*

    • Credits to Miles Sabin, for putting some of the code above together
      http://github.com/milessabin/scala-cont-jvm-coro-talk/blob/master/src/continuations/Generators.scala

提交回复
热议问题