I\'m trying to use Livy
to remotely submit several Spark
jobs. Lets say I want to perform following spark-submit
task rem
How can I make use of the
SparkSession
that I created usingPOST/sessions
request for submitting mySpark
job usingPOST/batches
request?
batch
-mode is intended for different use-case than session
-mode / LivyClient
The reason I've identified why this isn't possible is (please correct me if I'm wrong / incomplete) as follows
POST/batches
request accepts JAR
SparkSession
(or spark-shell
) from being re-used (without restarting the SparkSession
) because
JAR
from previous POST/batches
request?JAR
from current POST/batches
request?And here's a more complete picture
JAR
session
(obviously) cannot take JAR
sPySpark
: simple python
files) that can be loaded into the session
(and not JAR
s)Possible workaround
Spark
-application written in Scala
/ Java
, which must be bundled in a JAR
, will face this difficulty; Python
(PySpark
) users are lucky heresession
with your JAR
via POST/sessions
requestclass
from your JAR
via python
(submit POST /sessions/{sessionId}/statements
) as many times as you want (with possibly different parameters). While this wouldn't be straight-forward, it sounds very much possibleFinally I found some more alternatives to Livy
for remote spark-submit
; see this