spark-ec2

Can a PySpark Kernel(JupyterHub) run in yarn-client mode?

大兔子大兔子 提交于 2021-02-08 10:34:00
问题 My Current Setup: Spark EC2 Cluster with HDFS and YARN JuputerHub(0.7.0) PySpark Kernel with python27 The very simple code that I am using for this question: rdd = sc.parallelize([1, 2]) rdd.collect() The PySpark kernel that works as expected in Spark standalone has the following environment variable in the kernel json file: "PYSPARK_SUBMIT_ARGS": "--master spark://<spark_master>:7077 pyspark-shell" However, when I try to run in yarn-client mode it is getting stuck forever, while the log

Can a PySpark Kernel(JupyterHub) run in yarn-client mode?

佐手、 提交于 2021-02-08 10:31:57
问题 My Current Setup: Spark EC2 Cluster with HDFS and YARN JuputerHub(0.7.0) PySpark Kernel with python27 The very simple code that I am using for this question: rdd = sc.parallelize([1, 2]) rdd.collect() The PySpark kernel that works as expected in Spark standalone has the following environment variable in the kernel json file: "PYSPARK_SUBMIT_ARGS": "--master spark://<spark_master>:7077 pyspark-shell" However, when I try to run in yarn-client mode it is getting stuck forever, while the log

spark-ec2 not recognized when lauching cluster on windows 8.1

会有一股神秘感。 提交于 2019-12-11 02:07:13
问题 I'm a complete beginner on spark. I'm trying to run spark on Amazon EC2, but my system does not recognize "spark-ec2" or "./spark-ec2". It says "spark-ec2" is not recognized as an internal or external command. I followed the instruction here to launch a cluster. I would like to use Scala, how do I make it work? 回答1: Add PYTHON PATH environment variable with boto. PYTHONPATH="${SPARK_EC2_DIR}/third_party/boto-2.4.1.zip/boto-2.4.1:$PYTHONPATH" And execute the python script 回答2: In order to run