Been unsuccessful setting a spark cluster that can read AWS s3 files. The software I used are as follows:
Hadoop 3.2 was built against 1.11.563; stick the full shaded sdk of that specific version in your classpath "aws-java-sdk-bundle" and all should be well.
The SDK has been "fussy" in the past...and upgrade invariably causes surprises. For the curious Qualifying an AWS SDK update. It's probably about time someone does it again.
I was able to solve this issue on Spark 3.0/ Hadoop 3.2. I documented my answer here as well - AWS EKS Spark 3.0, Hadoop 3.2 Error - NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException
Use following AWS Java SDK bundle and this issue will be solved -
aws-java-sdk-bundle-1.11.874.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.874)
So I cleaned-up everything and re-installed the following versions of jars and it worked: hadoop-aws-2.7.4.jar, aws-java-sdk-1.7.4.2.jar. Spark install version: spark-2.4.7-bin-hadoop2.7. Python version: Python 3.6.