Java - Spark SQL DataFrame map function is not working

后端未结

关注

 6  812

说谎

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicab

相关标签:

6条回答

执念已碎

2021-01-01 02:27

No need to convert to RDD, its delays the execution it can be done as below

`public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

// Prior to java 1.8 
Encoder<String> encoder = Encoders.STRING();
    List<String> rowsList = df.map((new MapFunction<Row, String>() {
        private static final long serialVersionUID = 1L;

        @Override
        public String call(Row row) throws Exception {
            return "string:>" + row.getString(0).toString() + "<";
        }
    }), encoder).collectAsList();

// from java 1.8 onwards
List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
System.out.println(">>> " + rowsList);
System.out.println(">>> " + rowsList1);

0 讨论(0)

醉酒成梦

2021-01-01 02:29

Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

0 讨论(0)
发布评论:

提交评论
- 加载中...

别那么骄傲

2021-01-01 02:41

Do you have the correct dependency set in your pom. Set this and try

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>

0 讨论(0)

我寻月下人不归

2021-01-01 02:42

try this:

// SQL can be run over RDDs that have been registered as tables.
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");

List<String> teenagerNames = teenagers.toJavaRDD().map(
        new Function<Row, String>() {
      public String call(Row row) {
        return "Name: " + row.getString(0);
      }
    }).collect();

you have to transforme your DataFrame to javaRDD

0 讨论(0)

隐瞒了意图╮

2021-01-01 02:44

Change this to:

Java 6 & 7

List<String> teenagerNames = teenagers.javaRDD().map(
    new Function<Row, String>() {
    public String call(Row row) {
        return "Name: " + row.getString(0);
    }
}).collect();

Java 8

List<String> t2 = teenagers.javaRDD().map(
    row -> "Name: " + row.getString(0)
).collect();

Once you call javaRDD() it works just like any other RDD map function.

This works with Spark 1.3.0 and up.

0 讨论(0)

暗喜

2021-01-01 02:48

check if you are using the correct import for

Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

0 讨论(0)
发布评论:

提交评论
- 加载中...