I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose \'yield\' statement is extremely useful for i
'yield' sucks, continuations are better
Actually, Python's yield
is a continuation.
What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield
, and, also, precisely how it is implemented.
It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.
Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.
Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.
And that's why for-comprehensions do not work. A statement like this:
for { x <- xs } proc(x)
If translated into
xs.foreach(x => proc(x))
Where foreach
is a method on xs
's class. Unfortunately, xs
class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue
.
Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.
The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?
Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.
Here's how you do it.
// You want to write
for (x <- xs) { /* complex yield in here */ }
// Instead you write
xs.iterator.flatMap { /* Produce iterators in here */ }
// You want to write
yield(a)
yield(b)
// Instead you write
Iterator(a,b)
// You want to write
yield(a)
/* complex set of yields in here */
// Instead you write
Iterator(a) ++ /* produce complex iterator here */
That's it! All your cases can be reduced to one of these three.
In your case, your example would look something like
Source.fromFile(file).getLines().flatMap(x =>
Iterator("something") ++
":".r.split(x).iterator.flatMap(field =>
if (field contains "/") "/".r.split(field).iterator
else {
if (!field.startsWith("#")) {
/* vals, whatever */
if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
else Iterator(field)
}
else Iterator.empty
}
)
)
P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):
import scala.util.control.Breaks._
for (blah) { breakable { ... break ... } }
but that won't get you what you want because Scala doesn't have the yield you want.
The implementation below provides a Python-like generator.
Notice that there's a function called _yield
in the code below, because yield
is already a keyword in Scala, which by the way, does not have anything to do with yield
you know from Python.
import scala.annotation.tailrec
import scala.collection.immutable.Stream
import scala.util.continuations._
object Generators {
sealed trait Trampoline[+T]
case object Done extends Trampoline[Nothing]
case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]
class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
def next: T = {
cont() match {
case Continue(r, nextCont) => cont = nextCont; r
case _ => sys.error("Generator exhausted")
}
}
def hasNext = cont() != Done
}
type Gen[T] = cps[Trampoline[T]]
def generator[T](body: => Unit @Gen[T]): Generator[T] = {
new Generator((Unit) => reset { body; Done })
}
def _yield[T](t: T): Unit @Gen[T] =
shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
}
object TestCase {
import Generators._
def sectors = generator {
def tailrec(seq: Seq[String]): Unit @Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
def main(args: Array[String]): Unit = {
for (s <- sectors) { println(s) }
}
}
It works pretty well, including for the typical usage of for loops.
Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. Then, we will see why it needs to be like so in Scala.
If you are used to writing code in Python, you've probably used generators like this:
// This is Scala code that does not compile :(
// This code naively tries to mimic the way generators are used in Python
def myGenerator = generator {
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
list foreach {s => _yield(s)}
}
This code above does not compile. Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. I'm afraid this explanation is a complete failure. Let me try again:
If you had coded something like shown below, it would compile fine:
def myGenerator = generator {
_yield("Financials")
_yield("Materials")
_yield("Technology")
_yield("Utilities")
}
This code compiles because the generator can be decomposed in a sequence of yield
s and, in this case, a yield
matches the type involved in the continuation. To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield
. Just for the sake of clarification, we can think that the sequence of yield
s could be expressed like this:
{ some code here; _yield("Financials")
{ some other code here; _yield("Materials")
{ eventually even some more code here; _yield("Technology")
{ ok, fine, youve got the idea, right?; _yield("Utilities") }}}}
Again, without going deep into convoluted theory, the point is that, after a yield
you need to provide another block that ends with a yield
, or close the chain otherwise. This is what we are doing in the pseudo-code above: after the yield
we are opening another block which in turn ends with a yield
followed by another yield
which in turn ends with another yield
, and so on. Obviously this thing must end at some point. Then the only thing we are allowed to do is closing the entire chain.
OK. But... how we can yield
multiple pieces of information? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield
.
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit @Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
Let's analyze what's going on here:
Our generator function myGenerator
contains some logic that obtains that generates information. In this example, we simply use a sequence of strings.
Our generator function myGenerator
calls a recursive function which is responsible for yield
-ing multiple pieces of information, obtained from our sequence of strings.
The recursive function must be declared before use, otherwise the compiler crashes.
The recursive function tailrec
provides the tail recursion we need.
The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above.
Notice that tailrec
is just a convenient name we found, for the sake of clarification. In particular, tailrec
does not need to be the last statement of our generator function; not necessarily. The only restriction is that you have to provide a sequence of blocks which match the type of an yield
, like shown below:
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit @Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
_yield("Before the first call")
_yield("OK... not yet...")
_yield("Ready... steady... go")
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
_yield("done")
_yield("long life and prosperity")
}
One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances.
Let's examine the example below. We have three generators: sectors
, industries
and companies
. For brevity, only sectors
is completely shown. This generator employs a tailrec
function as demonstrated already above. The trick here is that the same tailrec
function is also employed by other generators. All we have to do is supply a different body
function.
type GenP = (NodeSeq, NodeSeq, NodeSeq)
type GenR = immutable.Map[String, String]
def tailrec(p: GenP)(body: GenP => GenR): Unit @Gen[GenR] = {
val (stats, rows, header) = p
if (!stats.isEmpty && !rows.isEmpty) {
val heads: GenP = (stats.head, rows.head, header)
val tails: GenP = (stats.tail, rows.tail, header)
_yield(body(heads))
// tail recursion
tailrec(tails)(body)
}
}
def sectors = generator[GenR] {
def body(p: GenP): GenR = {
// unpack arguments
val stat, row, header = p
// obtain name and url
val name = (row \ "a").text
val url = (row \ "a" \ "@href").text
// create map and populate fields: name and url
var m = new scala.collection.mutable.HashMap[String, String]
m.put("name", name)
m.put("url", url)
// populate other fields
(header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
// returns a map
m
}
val root : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
val header: scala.xml.NodeSeq = ... // code is omitted
val stats : scala.xml.NodeSeq = ... // code is omitted
val rows : scala.xml.NodeSeq = ... // code is omitted
// tail recursion
tailrec((stats, rows, header))(body)
}
def industries(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
def companies(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
Credits to Rich Dougherty and huynhjl.
See this SO thread: Implementing yield (yield return) using Scala continuations*
Credits to Miles Sabin, for putting some of the code above together
http://github.com/milessabin/scala-cont-jvm-coro-talk/blob/master/src/continuations/Generators.scala