Running a Spark SQL (v2.1.0_2.11) program in Java immediately fails with the following exception, as soon as the first action is called on a dataframe:
java.lang
The culprit is the library commons-compiler
. Here is the conflict:
To work around this, add the following to your pom.xml:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>2.7.8</version>
</dependency>
</dependencies>
</dependencyManagement>
I had the similar issues, when updated spark-2.2.1 to spark-2.3.0.
In my case, I had to fix commons-compiler and janino
Spark 2.3 solution:
<dependencyManagement>
<dependencies>
<!--Spark java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException-->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.8</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.8</version>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>
</dependencies>
this error still arises with org.apache.spark:spark-sql_2.12:2.4.6, but the Janino version have to be used is 3.0.16 With Gradle:
implementation 'org.codehaus.janino:commons-compiler:3.0.16'
implementation 'org.codehaus.janino:janino:3.0.16'
My implementation requirement is Spring-boot + Scala + Spark(2.4.5)
For this issue, solution is to exclude artifactID 'janino' and 'commons-compiler' which comes with 'spark-sql_2.12' version 2.4.5.
The reason being the updated version 3.1.2 for both artifactID 'janino' and 'commons-compiler' which comes with 'spark-sql_2.12' version 2.4.5.
After excluding, add version 3.0.8 for both artifactID 'janino' and 'commons-compiler' as separate dependency.
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.5</version>
<exclusions>
<exclusion>
<artifactId>janino</artifactId>
<groupId>org.codehaus.janino</groupId>
</exclusion>
<exclusion>
<artifactId>commons-compiler</artifactId>
<groupId>org.codehaus.janino</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<artifactId>janino</artifactId>
<groupId>org.codehaus.janino</groupId>
<version>3.0.8</version>
</dependency>
<dependency>
<artifactId>commons-compiler</artifactId>
<groupId>org.codehaus.janino</groupId>
<version>3.0.8</version>
</dependency>
...............
...............
...............
...............
...............
</dependencies>
in our migration from CDH Parcel 2.2.0.cloudera1 to 2.3.0.cloudera4 we have simply overwritten the maven property :
<janino.version>3.0.8</janino.version>
In addition, we have defined the proper version of the hive dependency in the dependency management part:
<hive.version>1.1.0-cdh5.13.3</hive.version>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>parquet-hadoop-bundle</artifactId>
<groupId>com.twitter</groupId>
</exclusion>
</exclusions>
</dependency>
The exclusions were necessary for the previous version, they might not be necessary anymore
If you are using the Spark 3.0.1
version, the latest at the days I'm writing this answer, you have to select version 3.0.16
for the two janino
dependencies for the @Maksym solution that works very well.