Can't get a SparkContext in new AWS EMR Cluster

后端 未结 3 739
名媛妹妹
名媛妹妹 2021-02-07 10:19

i just set up an AWS EMR Cluster (EMR Version 5.18 with Spark 2.3.2). I ssh into the master maschine and run spark-shell or pyspark and get the following error:

         


        
3条回答
  •  南笙
    南笙 (楼主)
    2021-02-07 10:35

    If you look into /etc/spark/conf/log4j.properties file, you'll find that there's new setup allowing to roll Spark Streaming logs hourly (probably as it's suggested here).

    The problem occurs because ${spark.yarn.app.container.log.dir} system property is not set in Spark driver process. The property is set eventually to Yarn's container log directory, but this happens later (look here and here).

    In order to fix this error in Spark driver, add the following to your spark-submit or spark-shell command: --driver-java-options='-Dspark.yarn.app.container.log.dir=/mnt/var/log/hadoop'

    Please note that /mnt/var/log/hadoop/stderr and /mnt/var/log/hadoop/stdout files will be reused by all the (Spark Streaming) processes started on the same node.

提交回复
热议问题