Spark History Server on S3A FileSystem: ClassNotFoundException

前端 未结 3 2095
隐瞒了意图╮
隐瞒了意图╮ 2021-02-06 12:03

Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem. By adding the following into the conf/spark-defaults.conf, I can get spark

3条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-06 12:25

    on EMR emr-5.16.0:

    I've added the following to my cluster bootstrap:

    sudo cp /usr/share/aws/aws-java-sdk/aws-java-sdk-core-*.jar /usr/lib/spark/jars/
    sudo cp /usr/share/aws/aws-java-sdk/aws-java-sdk-s3-*.jar /usr/lib/spark/jars/
    sudo cp /usr/lib/hadoop/hadoop-aws.jar /usr/lib/spark/jars/
    

    Then in the config of the cluster:

            {
              'Classification': 'spark-defaults',
              'Properties': {
                'spark.eventLog.dir': 's3a://some/path',
                'spark.history.fs.logDirectory': 's3a://some/path',
                'spark.eventLog.enabled': 'true'
              }
            }
    

    If you're going to test this, first stop the spark history server:

    sudo stop spark-history-server
    

    Make the config changes

    sudo vim /etc/spark/conf.dist/spark-defaults.conf
    

    Then run the copying of JARs as above

    Then restart the spark history server:

    sudo /usr/lib/spark/sbin/start-history-server.sh
    

    Thanks for the answers above!

提交回复
热议问题