问题
I am using spark-sql.2.4.1v, datastax-java-cassandra-connector_2.11-2.4.1.jar and java8.
I create the cassandra table like this:
create company(company_id int PRIMARY_KEY, company_name text);
JavaBean as below:
class CompanyRecord(
Integer company_id;
String company_name;
//getter and setters
//default & parametarized constructors
)
The spark code below saves the data into cassandra table:
Dataset<Row> latestUpdatedDs = joinUpdatedRecordsDs.select("company_id", "company_name"); /// select from other source like xls sheet
Encoder<CompanyRecord> comanyEncoder = Encoders.bean(CompanyRecord.class);
Dataset<CompanyRecord> inputDs = latestUpdatedDs.as(comanyEncoder );
inputDs
.write()
.format("org.apache.spark.sql.cassandra")
.option("table","company")
.option("keyspace", "ks_one")
.mode(SaveMode.Append)
.save();
Giving error like below:
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 562, Column 35: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 562, Column 35: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1304) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
Question:
How to figure out what is wrong here? And how to fix this?
回答1:
As far as i can understand :
Line 562, Column 35: A method named "toString" is not declared in any enclosing class nor any supertype.
This might be the issue,you might need to override toString of CompanyRecord class and also Spark works on custom objects which implement Serializable interface as mentioned in https://spark.apache.org/docs/latest/tuning.html.
This 2 things should solve your problem.
回答2:
This issue comes when there is a mismatch of data types i.e what you defined in table and what your bean/dataframe try to insert into.
Once I correct the data types properly the issue resolved.
来源:https://stackoverflow.com/questions/58593215/inserting-into-cassandra-table-from-spark-dataframe-results-in-org-codehaus-comm