问题
I'm trying to run a Scala Spark job that uses the Univocity CSV Parser and after upgrading to support a String delimiter (vs only character), I'm getting the following error when I run my jar in the cluster. Running it locally in my IDEA IDE produces expected results with no errors.
ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvFormat.setDelimiter(Ljava/lang/String;)V
java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvFormat.setDelimiter(Ljava/lang/String;)V
I've tried the following: Eliminated all conflicting univocity parsers by examining the dependency tree via this: mvn dependency:tree -Dverbose -Dincludes=com.univocity:univocity-parsers which yields:
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ preval ---
[INFO] dataqa:preval:jar:1.0-SNAPSHOT
[INFO] \- com.univocity:univocity-parsers:jar:2.8.2:compile
I also tried to set the spark.executor.userClassPathFirst=true config when running the spark job, with no change in behavior.
Here is the dependency section in my pom.xml:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
<!--
Spark library. spark-core_2.xx must match the scala language version
-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<!--
Spark SQL library. spark-sql_2.xx must match the scala language version
-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0</version>
<exclusions>
<exclusion> <!-- declare the exclusion here -->
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
Library to make REST API call
-->
<dependency>
<groupId>com.typesafe.play</groupId>
<artifactId>play-ahc-ws-standalone_2.11</artifactId>
<version>2.0.0-M1</version>
</dependency>
<!--
Parses delimited files
-->
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.8.2</version>
<type>jar</type>
</dependency>
<!-- https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple -->
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
</dependency>
I wonder if Spark has a built-in dependency that is overriding my version (2.8 is the first version to support String arguments. Previously, it only supported character).
Any insights?
回答1:
A bit late, but if using --conf
spark.driver.extraClassPath
and spark.executor.extraClassPath
is an option, please see my response here.
回答2:
After much time spent troubleshooting, I found a solution. I had to use the maven-shade-plugin as described here https://www.cloudera.com/documentation/enterprise/5-13-x/topics/spark_building.html#relocation
Here is the relevant portion of code I had to add to the maven-shade-plugin definition in my pom.xml:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<!-- other non-revelant filters etc omitted for brevity -->
<relocations>
<!-- used to make sure there are no conflicts between the univocity parser version used here and the one that is bundled with spark -->
<relocation>
<pattern>com.univocity.parsers</pattern>
<shadedPattern>com.shaded.parsers</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
来源:https://stackoverflow.com/questions/57247793/spark-java-lang-nosuchmethoderror-for-univocity-csv-parser-setdelimiter-method