问题
In Spark SQL when I tried to use map function on DataFrame then I am getting below error.
The method map(Function1, ClassTag) in the type DataFrame is not applicable for the arguments (new Function(){})
I am following spark 1.3 documentation as well. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Have any one solution?
Here is my testing code.
// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.map(
new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
回答1:
Change this to:
Java 6 & 7
List<String> teenagerNames = teenagers.javaRDD().map(
new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
Java 8
List<String> t2 = teenagers.javaRDD().map(
row -> "Name: " + row.getString(0)
).collect();
Once you call javaRDD() it works just like any other RDD map function.
This works with Spark 1.3.0 and up.
回答2:
No need to convert to RDD, its delays the execution it can be done as below
`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");
// Prior to java 1.8
Encoder<String> encoder = Encoders.STRING();
List<String> rowsList = df.map((new MapFunction<Row, String>() {
private static final long serialVersionUID = 1L;
@Override
public String call(Row row) throws Exception {
return "string:>" + row.getString(0).toString() + "<";
}
}), encoder).collectAsList();
// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);
}`
回答3:
Do you have the correct dependency set in your pom. Set this and try
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.1</version>
</dependency>
回答4:
try this:
// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.toJavaRDD().map(
new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
you have to transforme your DataFrame to javaRDD
回答5:
check if you are using the correct import for
Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct
回答6:
Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.
来源:https://stackoverflow.com/questions/29790417/java-spark-sql-dataframe-map-function-is-not-working