Spark and Influx: OKIO conflict

我怕爱的太早我们不能终老 提交于 2020-01-14 01:56:09

问题


I'm running a job on Spark Yarn and trying to emit messages to Influx DB but I'm crashing on an okio conflict:

22:17:54 ERROR ApplicationMaster - User class threw exception: java.lang.NoSuchMethodError: okio.BufferedSource.readUtf8LineStrict(J)Ljava/lang/String;
java.lang.NoSuchMethodError: okio.BufferedSource.readUtf8LineStrict(J)Ljava/lang/String;
    at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:212)
    at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)

Here's my dependencies:

val cdhVersion = "cdh5.12.2"
val sparkVersion = "2.2.0.cloudera2"
val parquetVersion = s"1.5.0-$cdhVersion"
val hadoopVersion = s"2.6.0-$cdhVersion"
val awsVersion = "1.11.295"
val log4jVersion = "1.2.17"
val slf4jVersion = "1.7.5" 

lazy val sparkDependencies = Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-hive" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion,
  "org.apache.hadoop" % "hadoop-common" % "2.2.0"
)

lazy val otherDependencies = Seq(
  "org.apache.spark" %% "spark-streaming-kinesis-asl" % "2.2.0",
  "org.clapper" %% "grizzled-slf4j" % "1.3.1",
  "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.6.2" % "runtime",
  "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
  "com.typesafe" % "config" % "1.3.1",
  "org.rogach" %% "scallop" % "3.0.3",
  "org.influxdb" % "influxdb-java" % "2.9"
)


libraryDependencies ++= sparkDependencies.map(_ % "provided" ) ++ otherDependencies

dependencyOverrides ++= Set("com.squareup.okio" % "okio" % "1.13.0")

Using the same jar I can run a succesful test to instantiate an InfluxDb instance in a non-spark job. But trying to do some from Spark throws the above error. Sounds like spark must have it's own version of OKIO that's causing this conflict at run when I use spark-submit. ... But it doesn't show that when I dump the dependency tree. Any advice on how I can bring my desired version of okio 1.13.0 to the spark cluster run path?

(as I'm typing I'm thinking to try shading which I will do now) Thanks


回答1:


In my case "using Apache Spark 1.6.3 with Hadoop HDP distribution"

  1. I run spark-shell and see on web UI what jar are used
  2. Search okhttp jar tf /usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar | grep okhttp
  3. Extract okhttp version jar xf /usr/hdp/current/spark-client/lib/spark-assembly-1.6.3.2.6.3.0-235-hadoop2.7.3.2.6.3.0-235.jar META-INF/maven/com.squareup.okhttp/okhttp/pom.xml

=> version 2.4.0

No idea who is provided this version.




回答2:


I had the same problem on spark 2.1.0.

Solution: I have downgraded the influxdb-java dependency from version 2.11 (2.12 has empty child dependency and we have problems at fat jar assembling) to 2.1.

Influxdb-java 2.1 have a different API, but it works on spark-submit applications.



来源:https://stackoverflow.com/questions/49481868/spark-and-influx-okio-conflict

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!