Use existing SparkSession in POST/batches request

后端 未结 1 1979
忘了有多久
忘了有多久 2021-01-25 02:12

I\'m trying to use Livy to remotely submit several Spark jobs. Lets say I want to perform following spark-submit task rem

相关标签:
1条回答
  • 2021-01-25 02:44

    How can I make use of the SparkSession that I created using POST/sessions request for submitting my Spark job using POST/batches request?

    • At this stage, I'm all but certain that this is not possible right now
    • @Luqman Ghani's comment gives a fairly good hint that batch-mode is intended for different use-case than session-mode / LivyClient

    The reason I've identified why this isn't possible is (please correct me if I'm wrong / incomplete) as follows

    • POST/batches request accepts JAR
    • This inhibits SparkSession (or spark-shell) from being re-used (without restarting the SparkSession) because
      • How would you remove JAR from previous POST/batches request?
      • How would you add JAR from current POST/batches request?

    And here's a more complete picture

    • Actually POST/sessions allows you to pass a JAR
    • but then further interactions with that session (obviously) cannot take JARs
    • they (further interactions) can only be simple scripts (like PySpark: simple python files) that can be loaded into the session (and not JARs)

    Possible workaround

    • All those who have their Spark-application written in Scala / Java, which must be bundled in a JAR, will face this difficulty; Python (PySpark) users are lucky here
    • As a possible workaround, you can try this (i see no reason why it wouldn't work)
      • launch a session with your JAR via POST/sessions request
      • then invoke the entrypoint-class from your JAR via python (submit POST /sessions/{sessionId}/statements) as many times as you want (with possibly different parameters). While this wouldn't be straight-forward, it sounds very much possible

    Finally I found some more alternatives to Livy for remote spark-submit; see this

    0 讨论(0)
提交回复
热议问题