Pyspark S3 error: java.lang.NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException

后端 未结 3 1688
Happy的楠姐
Happy的楠姐 2021-01-24 06:29

Been unsuccessful setting a spark cluster that can read AWS s3 files. The software I used are as follows:

  1. hadoop-aws-3.2.0.jar
  2. aws-java-sdk-1.11.887.jar<
相关标签:
3条回答
  • 2021-01-24 07:13

    Hadoop 3.2 was built against 1.11.563; stick the full shaded sdk of that specific version in your classpath "aws-java-sdk-bundle" and all should be well.

    The SDK has been "fussy" in the past...and upgrade invariably causes surprises. For the curious Qualifying an AWS SDK update. It's probably about time someone does it again.

    0 讨论(0)
  • 2021-01-24 07:24

    I was able to solve this issue on Spark 3.0/ Hadoop 3.2. I documented my answer here as well - AWS EKS Spark 3.0, Hadoop 3.2 Error - NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException

    Use following AWS Java SDK bundle and this issue will be solved -

    aws-java-sdk-bundle-1.11.874.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.874)

    0 讨论(0)
  • 2021-01-24 07:32

    So I cleaned-up everything and re-installed the following versions of jars and it worked: hadoop-aws-2.7.4.jar, aws-java-sdk-1.7.4.2.jar. Spark install version: spark-2.4.7-bin-hadoop2.7. Python version: Python 3.6.

    0 讨论(0)
提交回复
热议问题