Suppose I need to run two concurrent computations, wait for both of them, and then combine their results. More specifically, I need to run f1: X1 => Y1
and <
None of the methods in other answers does the right thing in case of a future that fails quickly plus a future that succeeds after a long time.
But such a method can be implemented manually:
def smartSequence[A](futures: Seq[Future[A]]): Future[Seq[A]] = {
val counter = new AtomicInteger(futures.size)
val result = Promise[Seq[A]]()
def attemptComplete(t: Try[A]): Unit = {
val remaining = counter.decrementAndGet
t match {
// If one future fails, fail the result immediately
case Failure(cause) => result tryFailure cause
// If all futures have succeeded, complete successful result
case Success(_) if remaining == 0 =>
result tryCompleteWith Future.sequence(futures)
case _ =>
}
}
futures.foreach(_ onComplete attemptComplete)
result.future
}
ScalaZ does a similar thing internally, so both f1 |@| f2
and List(f1, f2).sequence
fail immediately after any of the futures fails.
Here is a quick test of the failing time for those methods:
import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz._, Scalaz._
object ReflectionTest extends App {
def f1: Future[Unit] = Future {
Thread.sleep(2000)
}
def f2: Future[Unit] = Future {
Thread.sleep(1000)
throw new RuntimeException("Failure")
}
def test(name: String)(
f: (Future[Unit], Future[Unit]) => Future[Unit]
): Unit = {
val start = new Date().getTime
f(f1, f2).andThen {
case _ =>
println(s"Test $name completed in ${new Date().getTime - start}")
}
Thread.sleep(2200)
}
test("monadic") { (f1, f2) => for (v1 <- f1; v2 <- f2) yield () }
test("zip") { (f1, f2) => (f1 zip f2).map(_ => ()) }
test("Future.sequence") {
(f1, f2) => Future.sequence(Seq(f1, f2)).map(_ => ())
}
test("smartSequence") { (f1, f2) => smartSequence(Seq(f1, f2)).map(_ => ())}
test("scalaz |@|") { (f1, f2) => (f1 |@| f2) { case _ => ()}}
test("scalaz sequence") { (f1, f2) => List(f1, f2).sequence.map(_ => ())}
Thread.sleep(30000)
}
And the result on my machine is:
Test monadic completed in 2281
Test zip completed in 2008
Test Future.sequence completed in 2007
Test smartSequence completed in 1005
Test scalaz |@| completed in 1003
Test scalaz sequence completed in 1005
It needs not be sequential. The future computation may start the moment the future is created. Of course, if the future is created by the flatMap argument (and it will necessary be so if it needs the result of the first computation), then it will be sequential. But in code such as
val f1 = Future {....}
val f2 = Future {....}
for (a1 <- f1; a2 <- f2) yield f(a1, a2)
you get concurrent execution.
So the implementation of Applicative implied by Monad is ok.
Your post seems to contain two more or less independent questions.
I will address the concrete practical problem of running two concurrent computations first. The question about Applicative
is answered in the very end.
Suppose you have two asynchronous functions:
val f1: X1 => Future[Y1]
val f2: X2 => Future[Y2]
And two values:
val x1: X1
val x2: X2
Now you can start the computations in multiple different ways. Let's take a look at some of them.
Starting computations outside of for
(parallel)
Suppose you do this:
val y1: Future[Y1] = f1(x1)
val y2: Future[Y2] = f2(x2)
Now, the computations f1
and f2
are already running. It does not matter in which order you collect the results. You could do it with a for
-comprehension:
val y: Future[(Y1,Y2)] = for(res1 <- y1; res2 <- y2) yield (res1,res2)
Using the expressions y1
and y2
in the for
-comprehension does not interfere with the order of computation of y1
and y2
, they are still being computed in parallel.
Starting computations inside of for
(sequential)
If we simply take the definitions of y1
and y2
, and plug them into the for
comprehension directly, we will still get the same result, but the order of execution will be different:
val y = for (res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)
translates into
val y = f1(x1).flatMap{ res1 => f2(x2).map{ res2 => (res1, res2) } }
in particular, the second computation starts after the first one has terminated. This is usually not what one wants to have.
Here, a basic substitution principle is violated. If there were no side-effects, one probably could transform this version into the previous one, but in Scala, one has to take care of the order of execution explicitly.
Zipping futures (parallel)
Futures respect products. There is a method Future.zip
, which allows you to do this:
val y = f1(x1) zip f2(x2)
This would run both computations in parallel until both are done, or until one of them fails.
Demo
Here is a little script that demonstrates this behaviour (inspired by muhuk
's post):
import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import java.lang.Thread.sleep
import java.lang.System.{currentTimeMillis => millis}
var time: Long = 0
val x1 = 1
val x2 = 2
// this function just waits
val f1: Int => Future[Unit] = {
x => Future { sleep(x * 1000) }
}
// this function waits and then prints
// elapsed time
val f2: Int => Future[Unit] = {
x => Future {
sleep(x * 1000)
val elapsed = millis() - time
printf("Time: %1.3f seconds\n", elapsed / 1000.0)
}
}
/* Outside `for` */ {
time = millis()
val y1 = f1(x1)
val y2 = f2(x2)
val y = for(res1 <- y1; res2 <- y2) yield (res1,res2)
Await.result(y, Duration.Inf)
}
/* Inside `for` */ {
time = millis()
val y = for(res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)
Await.result(y, Duration.Inf)
}
/* Zip */ {
time = millis()
val y = f1(x1) zip f2(x2)
Await.result(y, Duration.Inf)
}
Output:
Time: 2.028 seconds
Time: 3.001 seconds
Time: 2.001 seconds
Applicative
Using this definition from your other post:
trait Applicative[F[_]] {
def apply[A, B](f: F[A => B]): F[A] => F[B]
}
one could do something like this:
object FutureApplicative extends Applicative[Future] {
def apply[A, B](ff: Future[A => B]): Future[A] => Future[B] = {
fa => for ((f,a) <- ff zip fa) yield f(a)
}
}
However, I'm not sure what this has to do with your concrete problem, or with understandable and readable code. A Future
already is a monad (this is stronger than Applicative
), and there is even built-in syntax for it, so I don't see any advantages in adding some Applicative
s here.
The problem is that monadic composition implies sequential wait. In our case it implies that we wait for one future first and then we will wait for another.
This is unfortunately true.
import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Test extends App {
def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)
timestamp("Start")
for {
step1 <- Future {
Thread.sleep(2000)
timestamp("step1")
}
step2 <- Future {
Thread.sleep(1000)
timestamp("step2")
}
} yield { timestamp("Done") }
Thread.sleep(4000)
}
Running this code outputs:
Start: 1430473518753
step1: 1430473520778
step2: 1430473521780
Done: 1430473521781
Thus it looks like we need an applicative composition of the futures to wait till either both complete or at least one future fails.
I am not sure applicative composition has anything to do with the concurrent strategy. Using for
comprehensions, you get a result if all futures complete or a failure if any of them fails. So it's semantically the same.
I think the reason why futures are run sequentially is because step1
is available within step2
(and in the rest of the computation). Essentially we can convert the for
block as:
def step1() = Future {
Thread.sleep(2000)
timestamp("step1")
}
def step2() = Future {
Thread.sleep(1000)
timestamp("step2")
}
def finalStep() = timestamp("Done")
step1().flatMap(step1 => step2()).map(finalStep())
So the result of previous computations are available to the rest of the steps. It differs from <?>
& <*>
in this respect.
@andrey-tyukin's code runs futures in parallel:
import java.util.Date
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Test extends App {
def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)
timestamp("Start")
(Future {
Thread.sleep(2000)
timestamp("step1")
} zip Future {
Thread.sleep(1000)
timestamp("step2")
}).map(_ => timestamp("Done"))
Thread.sleep(4000)
}
Output:
Start: 1430474667418
step2: 1430474668444
step1: 1430474669444
Done: 1430474669446