Why isn't guava being shaded properly in my build.sbt?

浪尽此生 提交于 2019-12-10 10:56:01

问题


tl;dr: Here's a repo containing the problem.


Cassandra and HDFS both use guava internally, but neither of them shades the dependency for various reasons. Because the versions of guava aren't binary compatible, I'm finding NoSuchMethodErrors at runtime.

I've tried to shade guava myself in my build.sbt:

val HadoopVersion =  "2.6.0-cdh5.11.0"

// ...

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion % "test" classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % "test" classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion % Test

// ...

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfs).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommon).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfsTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommonTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopMiniDFSCluster).inProject
)

assemblyJarName in assembly := s"${name.value}-${version.value}.jar"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
  case _ => MergeStrategy.first
}

but the runtime exception persists (ha -- it's a cassandra joke, people).

The specific exception is

[info] HdfsEntitySpec *** ABORTED ***
[info]   java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;
[info]   at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[info]   at java.lang.String.valueOf(String.java:2994)
[info]   at java.lang.StringBuilder.append(StringBuilder.java:131)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.<init>(RetryCacheMetrics.java:46)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)
[info]   at org.apache.hadoop.ipc.RetryCache.<init>(RetryCache.java:202)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:949)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:796)
[info]   at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info]   ...

How can I properly shade guava to stop the runtime errors?


回答1:


The shading rules will only apply when you are building a fat jar. It won't be applied during other sbt tasks.

If you want to shade some library inside of your hadoop dependencies, you can create a new project with only the hadoop dependencies, shade the libraries, and publish a fat jar with the all the shaded hadoop dependencies.

This is not a perfect solution, because all of the dependencies in the new hadoop jar will be "unknown" to whom uses them, and you will need to handle conflicts manually.

Here is the code that you will need in your build.sbt to publish a fat hadoop jar (using your code and sbt assembly docs):

val HadoopVersion =  "2.6.0-cdh5.11.0"

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion %  classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion 

lazy val fatJar = project
  .enablePlugins(AssemblyPlugin)
  .settings(
    libraryDependencies ++= Seq(
        hadoopHdfs,
        hadoopCommon,
        hadoopHdfsTest,
        hadoopCommonTest,
        hadoopMiniDFSCluster
    ),
      assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.common.**" -> "shade.@0").inAll
    ),
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
      case _ => MergeStrategy.first
    },
    artifact in (Compile, assembly) := {
      val art = (artifact in (Compile, assembly)).value
      art.withClassifier(Some("assembly"))
    },
    addArtifact(artifact in (Compile, assembly), assembly),
    crossPaths := false, // Do not append Scala versions to the generated artifacts
    autoScalaLibrary := false, // This forbids including Scala related libraries into the dependency
    skip in publish := true
  )

lazy val shaded_hadoop = project
  .settings(
    name := "shaded-hadoop",
    packageBin in Compile := (assembly in (fatJar, Compile)).value
  )

I haven't tests it, but that is the gist of it.


I'd like to point out out another issue that I noticed, your merge strategy might cause you problems, since you want to apply different strategies on some of the files. see the default strategy here.
I would recommend using something like this to preserve the original strategy for everything that is not deduplicate

assemblyMergeStrategy in assembly := {
          entry: String => {
            val strategy = (assemblyMergeStrategy in assembly).value(entry)
            if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
            else strategy
          }
      }


来源:https://stackoverflow.com/questions/47907446/why-isnt-guava-being-shaded-properly-in-my-build-sbt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!