Kill Spark Job programmatically

前端未结

关注

 2  1034

I am running pyspark application through Jupyter notebook. I can kill a job using Spark Web UI, but I want to kill it programmatically.

How can I kill it ???

相关标签:

2条回答

不思量自难忘°

2021-01-29 06:33

To expand on @Netanel Malka's answer, you can use the cancelAllJobs method to cancel every running job, or one can use the cancelJobGroup method to cancel jobs that have been organized into a group.

From the PySpark documentation:

cancelAllJobs()
Cancel all jobs that have been scheduled or are running.

cancelJobGroup(groupId)
Cancel active jobs for the specified group. See SparkContext.setJobGroup for more information.

And an example from the docs:

import threading
from time import sleep
result = "Not Set"
lock = threading.Lock()

def map_func(x):
    sleep(100)
    raise Exception("Task should have been cancelled")

def start_job(x):
    global result
    try:
        sc.setJobGroup("job_to_cancel", "some description")
        result = sc.parallelize(range(x)).map(map_func).collect()
    except Exception as e:
        result = "Cancelled"
    lock.release()

def stop_job():
    sleep(5)
    sc.cancelJobGroup("job_to_cancel")

suppress = lock.acquire()
suppress = threading.Thread(target=start_job, args=(10,)).start()
suppress = threading.Thread(target=stop_job).start()
suppress = lock.acquire()
print(result)

0 讨论(0)

深忆病人

2021-01-29 06:37
Suppose that you wrote this code:
```
from pyspark import SparkContext

sc = SparkContext("local", "Simple App")

# This will stop your app
sc.stop()
```
As descibes in the docs: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=stop#pyspark.SparkContext.stop
0 讨论(0)
发布评论:

提交评论
- 加载中...