It looks like I am again stuck on the running a packaged spark app jar using spark submit. Following is my pom file:
In my case, I was running a local Spark installation on a Cloudera edge node and was hitting this conflict (even though I made sure to download Spark with the correct hadoop binaries precompiled). I just went into my Spark home and moved the hadoop-common jar so it wouldn't be loaded:
mv ~/spark-2.4.4-bin-hadoop2.6/jars/hadoop-common-2.6.5.jar ~/spark-2.4.4-bin-hadoop2.6/jars/hadoop-common-2.6.5.jar.XXXXXX
After that, it ran... in local mode anyway.
The dependencies between hadoop and AWS JDK are very sensitive, and you should stick to using the correct versions that your hadoop dependency version was built with.
The first problem you need to solve is pick one version of Hadoop. I see you're mixing versions 2.8.3
and 2.8.0
.
When I look at the dependency tree for org.apache.hadoop:hadoop-aws:2.8.0
, I see that it is built against version 1.10.6
of the AWS SDK (same for hadoop-aws:2.8.3
).
This is probably what's causing mismatches (you're mixing incompatible versions). So:
hadoop-aws
with the version compatible with your hadoopIn case anybody else is still stumbling on this error... it took me a while to find out, but check if your project has a dependency (direct or transitive) on the package org.apache.avro/avro-tools. It was brought into my code by a transitive dependency. Its problem is that it ships with a copy of org.apache.hadoop.conf.Configuration that is much older than all current versions of hadoop, so it may end up being the one picked up in the classpath.
In my scala project, I just had to exclude it with
ExclusionRule("org.apache.avro","avro-tools")
and the error (finally!) disappear.
I am sure that the avro-tools coders had some good reason to include a copy of a file that belongs to another package (hadoop-common), I was really surprised to find it there and made me waste an entire day.