Why does elasticsearch-spark 5.5.0 fail with AbstractMethodError when submitting to YARN cluster?

问题

I wrote a spark job which main goal is to write into es, and submit it , the issue is when I submit it onto spark clusters, spark gave back

[ERROR][org.apache.spark.deploy.yarn.ApplicationMaster] User class threw exception: java.lang.AbstractMethodError: org.elasticsearch.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation; java.lang.AbstractMethodError: org.elasticsearch.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation; at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)here

But if I submit my job use local[2] ,the job worked out just fine. Strange, and two environments of jars are the same.I use elasticsearch-spark20_2.11_5.5.0 and spark2.2

回答1:

It appears you face a Spark version mismatch, i.e. you use elasticsearch-spark20_2.11_5.5.0 (note spark20 in the name) and Spark 2.2.

Quoting the javadoc of java.lang.AbstractMethodError:

Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").

Digging in deeper, this line in the stack trace gave me the exact version of Spark you've used, i.e. Spark 2.2.0.

org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)

That gives you the exact location where the issue was "born" (see that line):

dataSource.createRelation(sparkSession.sqlContext, mode, caseInsensitiveOptions, data)

That matches the top-most line in the stack trace:

java.lang.AbstractMethodError: org.elasticsearch.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation; java.lang.AbstractMethodError: org.elasticsearch.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation

It looks like the elasticsearch-spark20_2.11_5.5.0 connector is a CreatableRelationProvider, but somehow it does not implement the method. How is that possible since Spark 2.0 had this interface already?! Let's find out and review the source code of elasticsearch-spark20_2.11_5.5.0.

From the stack trace you know the ES implementation is org.elasticsearch.spark.sql.DefaultSource. The data source is indeed a CreatableRelationProvider:

private[sql] class DefaultSource ... with CreatableRelationProvider  {

And it does override the required createRelation method (as otherwise it would not have been possible to compile it since the interface existed since 1.3!)

The only change between the methods and the stack trace is data: DataFrame (in the connector and the interface) vs Lorg/apache/spark/sql/Dataset; in the stack trace. That begs the question about the code in your Spark application or perhaps there's something incorrect in how you submit the Spark application to the YARN cluster (and you do submit the Spark application to a YARN cluster, don't you?)

I'm puzzled, but hopefully the answer has shed some light on what might've been causing it.

来源：https://stackoverflow.com/questions/45502714/why-does-elasticsearch-spark-5-5-0-fail-with-abstractmethoderror-when-submitting

标签

apache-spark

ElasticSearch

apache-spark-sql

apache-spark-2.2