Kill Spark Job programmatically

前端 未结 2 1034
清歌不尽
清歌不尽 2021-01-29 06:20

I am running pyspark application through Jupyter notebook. I can kill a job using Spark Web UI, but I want to kill it programmatically.

How can I kill it ???

相关标签:
2条回答
  • 2021-01-29 06:33

    To expand on @Netanel Malka's answer, you can use the cancelAllJobs method to cancel every running job, or one can use the cancelJobGroup method to cancel jobs that have been organized into a group.

    From the PySpark documentation:

    cancelAllJobs()
    Cancel all jobs that have been scheduled or are running.
    
    cancelJobGroup(groupId)
    Cancel active jobs for the specified group. See SparkContext.setJobGroup for more information.
    

    And an example from the docs:

    import threading
    from time import sleep
    result = "Not Set"
    lock = threading.Lock()
    
    def map_func(x):
        sleep(100)
        raise Exception("Task should have been cancelled")
    
    def start_job(x):
        global result
        try:
            sc.setJobGroup("job_to_cancel", "some description")
            result = sc.parallelize(range(x)).map(map_func).collect()
        except Exception as e:
            result = "Cancelled"
        lock.release()
    
    def stop_job():
        sleep(5)
        sc.cancelJobGroup("job_to_cancel")
    
    suppress = lock.acquire()
    suppress = threading.Thread(target=start_job, args=(10,)).start()
    suppress = threading.Thread(target=stop_job).start()
    suppress = lock.acquire()
    print(result)
    
    0 讨论(0)
  • 2021-01-29 06:37

    Suppose that you wrote this code:

    from pyspark import SparkContext
    
    sc = SparkContext("local", "Simple App")
    
    # This will stop your app
    sc.stop()
    

    As descibes in the docs: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=stop#pyspark.SparkContext.stop

    0 讨论(0)
提交回复
热议问题