问题
I am starting to use Spark Streaming to process a real time data feed I am getting. My scenario is I have a Akka actor receiver using "with ActorHelper", then I have my Spark job doing some mappings and transformation and then I want to send the result to another actor.
My issue is the last part. When trying to send to another actor Spark is raising an exception:
15/02/20 16:43:16 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
The way I am creating this last actor is the following:
val actorSystem = SparkEnv.get.actorSystem
val lastActor = actorSystem.actorOf(MyLastActor.props(someParam), "MyLastActor")
And then using it like this:
result.foreachRDD(rdd => rdd.foreachPartition(lastActor ! _))
I am not sure where or how to do the advise "Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'". Do I need to set anything special through configuration? Or create my actor differently?
回答1:
Look at the following example to access an actor outside of the Spark domain.
/* * Following is the use of actorStream to plug in custom actor as receiver * * An important point to note: * Since Actor may exist outside the spark framework, It is thus user's responsibility * to ensure the type safety, i.e type of data received and InputDstream * should be same. * * For example: Both actorStream and SampleActorReceiver are parameterized * to same type to ensure type safety. */
val lines = ssc.actorStream[String](
Props(new SampleActorReceiver[String]("akka.tcp://test@%s:%s/user/FeederActor".format(
host, port.toInt))), "SampleReceiver")
回答2:
I found that if I collect before I send to the actor it works like a charm:
result.foreachRDD(rdd => rdd.collect().foreach(producer ! _))
来源:https://stackoverflow.com/questions/28639618/unable-to-deserialize-actorref-to-send-result-to-different-actor