Inserting into cassandra table from spark dataframe results in org.codehaus.commons.compiler.CompileException: File 'generated.java' Error

倖福魔咒の 提交于 2019-11-30 09:47:45

问题


I am using spark-sql.2.4.1v, datastax-java-cassandra-connector_2.11-2.4.1.jar and java8.

I create the cassandra table like this:

create company(company_id int PRIMARY_KEY, company_name text);

JavaBean as below:

class CompanyRecord(
 Integer company_id;
 String company_name;
//getter and setters
//default & parametarized constructors
)

The spark code below saves the data into cassandra table:

Dataset<Row> latestUpdatedDs = joinUpdatedRecordsDs.select("company_id", "company_name"); /// select from other source like xls sheet

Encoder<CompanyRecord> comanyEncoder =  Encoders.bean(CompanyRecord.class);         
Dataset<CompanyRecord> inputDs = latestUpdatedDs.as(comanyEncoder );


 inputDs 
        .write()
        .format("org.apache.spark.sql.cassandra")
        .option("table","company")
        .option("keyspace",  "ks_one")
        .mode(SaveMode.Append)
        .save();

Giving error like below:

Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 562, Column 35: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 562, Column 35: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1304) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

Question:

How to figure out what is wrong here? And how to fix this?


回答1:


As far as i can understand :

Line 562, Column 35: A method named "toString" is not declared in any enclosing class nor any supertype.

This might be the issue,you might need to override toString of CompanyRecord class and also Spark works on custom objects which implement Serializable interface as mentioned in https://spark.apache.org/docs/latest/tuning.html.

This 2 things should solve your problem.




回答2:


This issue comes when there is a mismatch of data types i.e what you defined in table and what your bean/dataframe try to insert into.

Once I correct the data types properly the issue resolved.



来源:https://stackoverflow.com/questions/58593215/inserting-into-cassandra-table-from-spark-dataframe-results-in-org-codehaus-comm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!