tl;dr: Here's a repo containing the problem.
Cassandra and HDFS both use guava internally, but neither of them shades the dependency for various reasons. Because the versions of guava aren't binary compatible, I'm finding NoSuchMethodError
s at runtime.
I've tried to shade guava myself in my build.sbt
:
val HadoopVersion = "2.6.0-cdh5.11.0"
// ...
val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion % "test" classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % "test" classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion % Test
// ...
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfs).inProject,
ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommon).inProject,
ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfsTest).inProject,
ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommonTest).inProject,
ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopMiniDFSCluster).inProject
)
assemblyJarName in assembly := s"${name.value}-${version.value}.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case _ => MergeStrategy.first
}
but the runtime exception persists (ha -- it's a cassandra joke, people).
The specific exception is
[info] HdfsEntitySpec *** ABORTED ***
[info] java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;
[info] at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[info] at java.lang.String.valueOf(String.java:2994)
[info] at java.lang.StringBuilder.append(StringBuilder.java:131)
[info] at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.<init>(RetryCacheMetrics.java:46)
[info] at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)
[info] at org.apache.hadoop.ipc.RetryCache.<init>(RetryCache.java:202)
[info] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:949)
[info] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:796)
[info] at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info] ...
How can I properly shade guava to stop the runtime errors?
The shading rules will only apply when you are building a fat jar. It won't be applied during other sbt tasks.
If you want to shade some library inside of your hadoop dependencies, you can create a new project with only the hadoop dependencies, shade the libraries, and publish a fat jar with the all the shaded hadoop dependencies.
This is not a perfect solution, because all of the dependencies in the new hadoop jar will be "unknown" to whom uses them, and you will need to handle conflicts manually.
Here is the code that you will need in your build.sbt
to publish a fat hadoop jar
(using your code and sbt assembly docs):
val HadoopVersion = "2.6.0-cdh5.11.0"
val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion
lazy val fatJar = project
.enablePlugins(AssemblyPlugin)
.settings(
libraryDependencies ++= Seq(
hadoopHdfs,
hadoopCommon,
hadoopHdfsTest,
hadoopCommonTest,
hadoopMiniDFSCluster
),
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.common.**" -> "shade.@0").inAll
),
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case _ => MergeStrategy.first
},
artifact in (Compile, assembly) := {
val art = (artifact in (Compile, assembly)).value
art.withClassifier(Some("assembly"))
},
addArtifact(artifact in (Compile, assembly), assembly),
crossPaths := false, // Do not append Scala versions to the generated artifacts
autoScalaLibrary := false, // This forbids including Scala related libraries into the dependency
skip in publish := true
)
lazy val shaded_hadoop = project
.settings(
name := "shaded-hadoop",
packageBin in Compile := (assembly in (fatJar, Compile)).value
)
I haven't tests it, but that is the gist of it.
I'd like to point out out another issue that I noticed, your merge strategy might cause you problems, since you want to apply different strategies on some of the files. see the default strategy here.
I would recommend using something like this to preserve the original strategy for everything that is not deduplicate
assemblyMergeStrategy in assembly := {
entry: String => {
val strategy = (assemblyMergeStrategy in assembly).value(entry)
if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
else strategy
}
}
来源:https://stackoverflow.com/questions/47907446/why-isnt-guava-being-shaded-properly-in-my-build-sbt