Spark submit (2.3) on kubernetes cluster from Python

前端 未结 1 1936
天涯浪人
天涯浪人 2021-02-10 10:43

So now that k8s is integrated directly with spark in 2.3 my spark submit from the console executes correctly on a kuberenetes master without any spark master pods running, spark

1条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-10 10:53

    I afraid that is impossible for Spark 2.3, if you using native Kubernetes support.

    Based on description from deployment instruction, submission process container several steps:

    1. Spark creates a Spark driver running within a Kubernetes pod.
    2. The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
    3. When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.

    So, in fact, you have no place to submit a job until you starting a submission process, which will launch a first Spark's pod (driver) for you. And after application completes, everything terminated.

    Because of running a fat container on AWS Lambda is not a best solution, and also because if is not way to run any commands in container itself (is is possible, but with hack, here is blueprint about executing Bash inside an AWS Lambda) the simplest way is to write some small custom service, which will work on machine outside of AWS Lambda and provide REST interface between your application and spark-submit utility. I don't see any other ways to make it without a pain.

    0 讨论(0)
提交回复
热议问题