In Spark SQL when I tried to use map function on DataFrame then I am getting below error.
The method map(Function1, ClassTag) in the type DataFrame is not applicab
No need to convert to RDD, its delays the execution it can be done as below
`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");
// Prior to java 1.8
Encoder<String> encoder = Encoders.STRING();
List<String> rowsList = df.map((new MapFunction<Row, String>() {
private static final long serialVersionUID = 1L;
@Override
public String call(Row row) throws Exception {
return "string:>" + row.getString(0).toString() + "<";
}
}), encoder).collectAsList();
// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);
}`
Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.
Do you have the correct dependency set in your pom. Set this and try
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.1</version>
</dependency>
try this:
// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.toJavaRDD().map(
new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
you have to transforme your DataFrame to javaRDD
Change this to:
Java 6 & 7
List<String> teenagerNames = teenagers.javaRDD().map(
new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
Java 8
List<String> t2 = teenagers.javaRDD().map(
row -> "Name: " + row.getString(0)
).collect();
Once you call javaRDD() it works just like any other RDD map function.
This works with Spark 1.3.0 and up.
check if you are using the correct import for
Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct