What is the preferred way to implement 'yield' in Scala?

前端 未结 3 1285
轻奢々
轻奢々 2021-01-31 19:23

I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose \'yield\' statement is extremely useful for i

相关标签:
3条回答
  • 2021-01-31 19:52

    'yield' sucks, continuations are better

    Actually, Python's yield is a continuation.

    What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield, and, also, precisely how it is implemented.

    It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.

    Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.

    Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.

    And that's why for-comprehensions do not work. A statement like this:

    for { x <- xs } proc(x)
    

    If translated into

    xs.foreach(x => proc(x))
    

    Where foreach is a method on xs's class. Unfortunately, xs class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue.

    Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.

    0 讨论(0)
  • 2021-01-31 20:02

    The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?

    Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.

    Here's how you do it.

    // You want to write
    for (x <- xs) { /* complex yield in here */ }
    // Instead you write
    xs.iterator.flatMap { /* Produce iterators in here */ }
    
    // You want to write
    yield(a)
    yield(b)
    // Instead you write
    Iterator(a,b)
    
    // You want to write
    yield(a)
    /* complex set of yields in here */
    // Instead you write
    Iterator(a) ++ /* produce complex iterator here */
    

    That's it! All your cases can be reduced to one of these three.

    In your case, your example would look something like

    Source.fromFile(file).getLines().flatMap(x =>
      Iterator("something") ++
      ":".r.split(x).iterator.flatMap(field =>
        if (field contains "/") "/".r.split(field).iterator
        else {
          if (!field.startsWith("#")) {
            /* vals, whatever */
            if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
            else Iterator(field)
          }
          else Iterator.empty
        }
      )
    )
    

    P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):

    import scala.util.control.Breaks._
    for (blah) { breakable { ... break ... } }
    

    but that won't get you what you want because Scala doesn't have the yield you want.

    0 讨论(0)
  • 2021-01-31 20:10

    The implementation below provides a Python-like generator.

    Notice that there's a function called _yield in the code below, because yield is already a keyword in Scala, which by the way, does not have anything to do with yield you know from Python.

    import scala.annotation.tailrec
    import scala.collection.immutable.Stream
    import scala.util.continuations._
    
    object Generators {
      sealed trait Trampoline[+T]
    
      case object Done extends Trampoline[Nothing]
      case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]
    
      class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
        def next: T = {
          cont() match {
            case Continue(r, nextCont) => cont = nextCont; r
            case _ => sys.error("Generator exhausted")
          }
        }
    
        def hasNext = cont() != Done
      }
    
      type Gen[T] = cps[Trampoline[T]]
    
      def generator[T](body: => Unit @Gen[T]): Generator[T] = {
        new Generator((Unit) => reset { body; Done })
      }
    
      def _yield[T](t: T): Unit @Gen[T] =
        shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
    }
    
    
    object TestCase {
      import Generators._
    
      def sectors = generator {
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
      }
    
      def main(args: Array[String]): Unit = {
        for (s <- sectors) { println(s) }
      }
    }
    

    It works pretty well, including for the typical usage of for loops.

    Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. Then, we will see why it needs to be like so in Scala.

    If you are used to writing code in Python, you've probably used generators like this:

    // This is Scala code that does not compile :(
    // This code naively tries to mimic the way generators are used in Python
    
    def myGenerator = generator {
      val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
      list foreach {s => _yield(s)}
    }
    

    This code above does not compile. Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. I'm afraid this explanation is a complete failure. Let me try again:

    If you had coded something like shown below, it would compile fine:

    def myGenerator = generator {
      _yield("Financials")
      _yield("Materials")
      _yield("Technology")
      _yield("Utilities")
    }
    

    This code compiles because the generator can be decomposed in a sequence of yields and, in this case, a yield matches the type involved in the continuation. To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield. Just for the sake of clarification, we can think that the sequence of yields could be expressed like this:

    { some code here; _yield("Financials")
        { some other code here; _yield("Materials")
            { eventually even some more code here; _yield("Technology")
                { ok, fine, youve got the idea, right?; _yield("Utilities") }}}}
    

    Again, without going deep into convoluted theory, the point is that, after a yield you need to provide another block that ends with a yield, or close the chain otherwise. This is what we are doing in the pseudo-code above: after the yield we are opening another block which in turn ends with a yield followed by another yield which in turn ends with another yield, and so on. Obviously this thing must end at some point. Then the only thing we are allowed to do is closing the entire chain.

    OK. But... how we can yield multiple pieces of information? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield.

      def myGenerator = generator {
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        val list = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
      }
    

    Let's analyze what's going on here:

    1. Our generator function myGenerator contains some logic that obtains that generates information. In this example, we simply use a sequence of strings.

    2. Our generator function myGenerator calls a recursive function which is responsible for yield-ing multiple pieces of information, obtained from our sequence of strings.

    3. The recursive function must be declared before use, otherwise the compiler crashes.

    4. The recursive function tailrec provides the tail recursion we need.

    The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above.

    Notice that tailrec is just a convenient name we found, for the sake of clarification. In particular, tailrec does not need to be the last statement of our generator function; not necessarily. The only restriction is that you have to provide a sequence of blocks which match the type of an yield, like shown below:

      def myGenerator = generator {
    
        def tailrec(seq: Seq[String]): Unit @Gen[String] = {
          if (!seq.isEmpty) {
            _yield(seq.head)
            tailrec(seq.tail)
          }
        }
    
        _yield("Before the first call")
        _yield("OK... not yet...")
        _yield("Ready... steady... go")
    
        val list = List("Financials", "Materials", "Technology", "Utilities")
        tailrec(list)
    
        _yield("done")
        _yield("long life and prosperity")
      }
    

    One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances.

    Let's examine the example below. We have three generators: sectors, industries and companies. For brevity, only sectors is completely shown. This generator employs a tailrec function as demonstrated already above. The trick here is that the same tailrec function is also employed by other generators. All we have to do is supply a different body function.

    type GenP = (NodeSeq, NodeSeq, NodeSeq)
    type GenR = immutable.Map[String, String]
    
    def tailrec(p: GenP)(body: GenP => GenR): Unit @Gen[GenR] = {
      val (stats, rows, header)  = p
      if (!stats.isEmpty && !rows.isEmpty) {
        val heads: GenP = (stats.head, rows.head, header)
        val tails: GenP = (stats.tail, rows.tail, header)
        _yield(body(heads))
        // tail recursion
        tailrec(tails)(body)
      }
    }
    
    def sectors = generator[GenR] {
      def body(p: GenP): GenR = {
          // unpack arguments
          val stat, row, header = p
          // obtain name and url
          val name = (row \ "a").text
          val url  = (row \ "a" \ "@href").text
          // create map and populate fields: name and url
          var m = new scala.collection.mutable.HashMap[String, String]
          m.put("name", name)
          m.put("url",  url)
          // populate other fields
          (header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
          // returns a map
          m
      }
    
      val root  : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
      val header: scala.xml.NodeSeq = ... // code is omitted
      val stats : scala.xml.NodeSeq = ... // code is omitted
      val rows  : scala.xml.NodeSeq = ... // code is omitted
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    def industries(sector: String) = generator[GenR] {
      def body(p: GenP): GenR = {
          //++ similar to 'body' demonstrated in "sectors"
          // returns a map
          m
      }
    
      //++ obtain NodeSeq variables, like demonstrated in "sectors" 
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    def companies(sector: String) = generator[GenR] {
      def body(p: GenP): GenR = {
          //++ similar to 'body' demonstrated in "sectors"
          // returns a map
          m
      }
    
      //++ obtain NodeSeq variables, like demonstrated in "sectors" 
      // tail recursion
      tailrec((stats, rows, header))(body)
    } 
    
    • Credits to Rich Dougherty and huynhjl.
      See this SO thread: Implementing yield (yield return) using Scala continuations*

    • Credits to Miles Sabin, for putting some of the code above together
      http://github.com/milessabin/scala-cont-jvm-coro-talk/blob/master/src/continuations/Generators.scala

    0 讨论(0)
提交回复
热议问题