How to implement Future as Applicative in Scala?

后端 未结 4 537
有刺的猬
有刺的猬 2021-01-12 22:51

Suppose I need to run two concurrent computations, wait for both of them, and then combine their results. More specifically, I need to run f1: X1 => Y1 and <

相关标签:
4条回答
  • 2021-01-12 23:09

    None of the methods in other answers does the right thing in case of a future that fails quickly plus a future that succeeds after a long time.

    But such a method can be implemented manually:

    def smartSequence[A](futures: Seq[Future[A]]): Future[Seq[A]] = {
      val counter = new AtomicInteger(futures.size)
      val result = Promise[Seq[A]]()
    
      def attemptComplete(t: Try[A]): Unit = {
        val remaining = counter.decrementAndGet
        t match {
          // If one future fails, fail the result immediately
          case Failure(cause) => result tryFailure cause
          // If all futures have succeeded, complete successful result
          case Success(_) if remaining == 0 => 
            result tryCompleteWith Future.sequence(futures)
          case _ =>
        }
      }
    
      futures.foreach(_ onComplete attemptComplete)
      result.future
    }
    

    ScalaZ does a similar thing internally, so both f1 |@| f2 and List(f1, f2).sequence fail immediately after any of the futures fails.

    Here is a quick test of the failing time for those methods:

    import java.util.Date
    import scala.concurrent.Future
    import scala.concurrent.ExecutionContext.Implicits.global
    import scalaz._, Scalaz._
    
    object ReflectionTest extends App {
      def f1: Future[Unit] = Future {
        Thread.sleep(2000)
      }
    
      def f2: Future[Unit] = Future {
        Thread.sleep(1000)
        throw new RuntimeException("Failure")
      }
    
      def test(name: String)(
        f: (Future[Unit], Future[Unit]) => Future[Unit]
      ): Unit = {
        val start = new Date().getTime
        f(f1, f2).andThen {
          case _ => 
            println(s"Test $name completed in ${new Date().getTime - start}")
        }
        Thread.sleep(2200)
      }
    
      test("monadic") { (f1, f2) => for (v1 <- f1; v2 <- f2) yield () }
    
      test("zip") { (f1, f2) => (f1 zip f2).map(_ => ()) }
    
      test("Future.sequence") { 
        (f1, f2) => Future.sequence(Seq(f1, f2)).map(_ => ()) 
      }
    
      test("smartSequence") { (f1, f2) => smartSequence(Seq(f1, f2)).map(_ => ())}
    
      test("scalaz |@|") { (f1, f2) => (f1 |@| f2) { case _ => ()}}
    
      test("scalaz sequence") { (f1, f2) => List(f1, f2).sequence.map(_ => ())}
    
      Thread.sleep(30000)
    }
    

    And the result on my machine is:

    Test monadic completed in 2281
    Test zip completed in 2008
    Test Future.sequence completed in 2007
    Test smartSequence completed in 1005
    Test scalaz |@| completed in 1003
    Test scalaz sequence completed in 1005
    
    0 讨论(0)
  • 2021-01-12 23:13

    It needs not be sequential. The future computation may start the moment the future is created. Of course, if the future is created by the flatMap argument (and it will necessary be so if it needs the result of the first computation), then it will be sequential. But in code such as

    val f1 = Future {....}
    val f2 = Future {....}
    for (a1 <- f1; a2 <- f2) yield f(a1, a2)
    

    you get concurrent execution.

    So the implementation of Applicative implied by Monad is ok.

    0 讨论(0)
  • 2021-01-12 23:15

    Your post seems to contain two more or less independent questions. I will address the concrete practical problem of running two concurrent computations first. The question about Applicative is answered in the very end.

    Suppose you have two asynchronous functions:

    val f1: X1 => Future[Y1]
    val f2: X2 => Future[Y2]
    

    And two values:

    val x1: X1
    val x2: X2  
    

    Now you can start the computations in multiple different ways. Let's take a look at some of them.

    Starting computations outside of for (parallel)

    Suppose you do this:

    val y1: Future[Y1] = f1(x1)
    val y2: Future[Y2] = f2(x2)
    

    Now, the computations f1 and f2 are already running. It does not matter in which order you collect the results. You could do it with a for-comprehension:

    val y: Future[(Y1,Y2)] = for(res1 <- y1; res2 <- y2) yield (res1,res2)
    

    Using the expressions y1 and y2 in the for-comprehension does not interfere with the order of computation of y1 and y2, they are still being computed in parallel.

    Starting computations inside of for (sequential)

    If we simply take the definitions of y1 and y2, and plug them into the for comprehension directly, we will still get the same result, but the order of execution will be different:

    val y = for (res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)
    

    translates into

    val y = f1(x1).flatMap{ res1 => f2(x2).map{ res2 => (res1, res2) } }
    

    in particular, the second computation starts after the first one has terminated. This is usually not what one wants to have.

    Here, a basic substitution principle is violated. If there were no side-effects, one probably could transform this version into the previous one, but in Scala, one has to take care of the order of execution explicitly.

    Zipping futures (parallel)

    Futures respect products. There is a method Future.zip, which allows you to do this:

    val y = f1(x1) zip f2(x2)
    

    This would run both computations in parallel until both are done, or until one of them fails.

    Demo

    Here is a little script that demonstrates this behaviour (inspired by muhuk's post):

    import scala.concurrent._
    import scala.concurrent.duration._
    import scala.concurrent.ExecutionContext.Implicits.global
    import java.lang.Thread.sleep
    import java.lang.System.{currentTimeMillis => millis}
    
    var time: Long = 0
    
    val x1 = 1
    val x2 = 2
    
    // this function just waits
    val f1: Int => Future[Unit] = { 
      x => Future { sleep(x * 1000) }
    }
    
    // this function waits and then prints
    // elapsed time
    val f2: Int => Future[Unit] = {
      x => Future { 
        sleep(x * 1000)
        val elapsed = millis() - time
        printf("Time: %1.3f seconds\n", elapsed / 1000.0)
      }
    }
    
    /* Outside `for` */ {
      time = millis()
      val y1 = f1(x1)
      val y2 = f2(x2)
      val y = for(res1 <- y1; res2 <- y2) yield (res1,res2)
      Await.result(y, Duration.Inf)
    }
    
    /* Inside `for` */ {
      time = millis()
      val y = for(res1 <- f1(x1); res2 <- f2(x2)) yield (res1, res2)
      Await.result(y, Duration.Inf)
    }
    
    /* Zip */ {
      time = millis()
      val y = f1(x1) zip f2(x2)
      Await.result(y, Duration.Inf)
    }
    

    Output:

    Time: 2.028 seconds
    Time: 3.001 seconds
    Time: 2.001 seconds
    

    Applicative

    Using this definition from your other post:

    trait Applicative[F[_]] {
      def apply[A, B](f: F[A => B]): F[A] => F[B]
    }
    

    one could do something like this:

    object FutureApplicative extends Applicative[Future] {
      def apply[A, B](ff: Future[A => B]): Future[A] => Future[B] = {
        fa => for ((f,a) <- ff zip fa) yield f(a)
      }
    }
    

    However, I'm not sure what this has to do with your concrete problem, or with understandable and readable code. A Future already is a monad (this is stronger than Applicative), and there is even built-in syntax for it, so I don't see any advantages in adding some Applicatives here.

    0 讨论(0)
  • 2021-01-12 23:21

    The problem is that monadic composition implies sequential wait. In our case it implies that we wait for one future first and then we will wait for another.

    This is unfortunately true.

    import java.util.Date
    import scala.concurrent.Future
    import scala.concurrent.ExecutionContext.Implicits.global
    
    object Test extends App {
            def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)
    
            timestamp("Start")
            for {
                    step1 <- Future {
                            Thread.sleep(2000)
                            timestamp("step1")
                    }
                    step2 <- Future {
                            Thread.sleep(1000)
                            timestamp("step2")
                    }
            } yield { timestamp("Done") }
    
            Thread.sleep(4000)
    }
    

    Running this code outputs:

    Start: 1430473518753
    step1: 1430473520778
    step2: 1430473521780
    Done: 1430473521781
    

    Thus it looks like we need an applicative composition of the futures to wait till either both complete or at least one future fails.

    I am not sure applicative composition has anything to do with the concurrent strategy. Using for comprehensions, you get a result if all futures complete or a failure if any of them fails. So it's semantically the same.

    Why Are They Running Sequentially

    I think the reason why futures are run sequentially is because step1 is available within step2 (and in the rest of the computation). Essentially we can convert the for block as:

    def step1() = Future {
        Thread.sleep(2000)
        timestamp("step1")
    }
    def step2() = Future {
        Thread.sleep(1000)
        timestamp("step2")
    }
    def finalStep() = timestamp("Done")
    step1().flatMap(step1 => step2()).map(finalStep())
    

    So the result of previous computations are available to the rest of the steps. It differs from <?> & <*> in this respect.

    How To Run Futures In Parallel

    @andrey-tyukin's code runs futures in parallel:

    import java.util.Date
    import scala.concurrent.Future
    import scala.concurrent.ExecutionContext.Implicits.global
    
    object Test extends App {
            def timestamp(label: String): Unit = Console.println(label + ": " + new Date().getTime.toString)
    
            timestamp("Start")
            (Future {
                    Thread.sleep(2000)
                    timestamp("step1")
            } zip Future {
                    Thread.sleep(1000)
                    timestamp("step2")
            }).map(_ => timestamp("Done"))
            Thread.sleep(4000)
    }
    

    Output:

    Start: 1430474667418
    step2: 1430474668444
    step1: 1430474669444
    Done: 1430474669446
    
    0 讨论(0)
提交回复
热议问题