Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

匿名 (未验证) 提交于 2019-12-03 01:06:02

问题:

Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph:

val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("") 

Produces the following:

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;   at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)   at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)   at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)   at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)   at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)   at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)   at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)   at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)   at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)   at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)   at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)   at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)   at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)   at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)   at scala.Option.orElse(Option.scala:289)   at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)   at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)   ... 47 elided 

This error does not happen in the normal spark-shell, only in Zeppelin. I have attempted the following fixes, which do nothing:

  • Download jackson 2.6.2 jars to the zeppelin lib folder and restart
  • Add jackson 2.9 dependencies from the maven repositories to the interpreter settings
  • Deleting the jackson jars from the zeppelin lib folder

Googling is turning up no similar situations. Please don't hesitate to ask for more information, or make suggestions. Thanks!

回答1:

I had the same problem. I added com.amazonaws:aws-java-sdk and org.apache.hadoop:hadoop-aws as dependencies for the Spark interpreter. These dependencies bring in their own versions of com.fasterxml.jackson.core:* and conflict with Spark's.

You also must exclude com.fasterxml.jackson.core:* from other dependencies, this is an example ${ZEPPELIN_HOME}/conf/interpreter.json Spark interpreter depenency section:

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]



回答2:

Another way is to include it right in the notebook cell:

%dep z.load("com.fasterxml.jackson.core:jackson-core:2.6.2") 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!