Spark History Server on S3A FileSystem: ClassNotFoundException

前端未结

关注

 3  2102

隐瞒了意图╮ 2021-02-06 12:03

Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem. By adding the following into the conf/spark-defaults.conf, I can get spark

3条回答

谎友^ (楼主)

2021-02-06 12:25

on EMR emr-5.16.0:

I've added the following to my cluster bootstrap:

sudo cp /usr/share/aws/aws-java-sdk/aws-java-sdk-core-*.jar /usr/lib/spark/jars/
sudo cp /usr/share/aws/aws-java-sdk/aws-java-sdk-s3-*.jar /usr/lib/spark/jars/
sudo cp /usr/lib/hadoop/hadoop-aws.jar /usr/lib/spark/jars/

Then in the config of the cluster:

        {
          'Classification': 'spark-defaults',
          'Properties': {
            'spark.eventLog.dir': 's3a://some/path',
            'spark.history.fs.logDirectory': 's3a://some/path',
            'spark.eventLog.enabled': 'true'
          }
        }

If you're going to test this, first stop the spark history server:

sudo stop spark-history-server

Make the config changes

sudo vim /etc/spark/conf.dist/spark-defaults.conf

Then run the copying of JARs as above

Then restart the spark history server:

sudo /usr/lib/spark/sbin/start-history-server.sh

Thanks for the answers above!

0 讨论(0)

查看其它3个回答