可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph:

val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("")

Produces the following:

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;   at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)   at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)   at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)   at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)   at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)   at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)   at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)   at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)   at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)   at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)   at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)   at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)   at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)   at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)   at scala.Option.orElse(Option.scala:289)   at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)   at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)   at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)   ... 47 elided

This error does not happen in the normal spark-shell, only in Zeppelin. I have attempted the following fixes, which do nothing:

Download jackson 2.6.2 jars to the zeppelin lib folder and restart
Add jackson 2.9 dependencies from the maven repositories to the interpreter settings
Deleting the jackson jars from the zeppelin lib folder

Googling is turning up no similar situations. Please don't hesitate to ask for more information, or make suggestions. Thanks!

回答1:

I had the same problem. I added com.amazonaws:aws-java-sdk and org.apache.hadoop:hadoop-aws as dependencies for the Spark interpreter. These dependencies bring in their own versions of com.fasterxml.jackson.core:* and conflict with Spark's.

You also must exclude com.fasterxml.jackson.core:* from other dependencies, this is an example ${ZEPPELIN_HOME}/conf/interpreter.json Spark interpreter depenency section:

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]

回答2:

Another way is to include it right in the notebook cell:

%dep z.load("com.fasterxml.jackson.core:jackson-core:2.6.2")

文章来源: Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

标签

parquet

zeppelin

spark