I am integrating the use of Akka actors and Spark in the following way: when a task is distributed among the Spark nodes, while processing that tasks, each node also periodi
An ActorSystem
is responsible for all of the functionality involved with ActorRef
objects.
When you program something like
actorRef ! message
You're actually invoking a bunch of work within the ActorSystem, not the ActorRef, to put the message in the right mailbox, tee-up the Actor to run the receive method within the thread pool, etc... From the documentation:
An actor system manages the resources it is configured to use in order to run the actors which it contains. There may be millions of actors within one such system, after all the mantra is to view them as abundant and they weigh in at an overhead of only roughly 300 bytes per instance. Naturally, the exact order in which messages are processed in large systems is not controllable by the application author
That is why your code works fine "standalone", but not in Spark. Each of your Spark nodes is missing the ActorSystem machinery, therefore even if you could de-serialize the ActorRef in a node there would be no ActorSystem to process the !
in your node function.
You can establish an ActorSystem within each node and use (i) remoting to send messages to your ActorRef in the "master" ActorSystem via actorSelection or (ii) the serialization method you mentioned where each node's ActorSystem would be the system
in the example you quoted.