Java - Spark SQL DataFrame map function is not working

后端 未结 6 812
说谎
说谎 2021-01-01 01:46

In Spark SQL when I tried to use map function on DataFrame then I am getting below error.

The method map(Function1, ClassTag) in the type DataFrame is not applicab

相关标签:
6条回答
  • 2021-01-01 02:27

    No need to convert to RDD, its delays the execution it can be done as below

    `public static void mapMethod() { // Read the data from file, where the file is in the classpath. Dataset df = sparkSession.read().json("file1.json");

    // Prior to java 1.8 
    Encoder<String> encoder = Encoders.STRING();
        List<String> rowsList = df.map((new MapFunction<Row, String>() {
            private static final long serialVersionUID = 1L;
    
            @Override
            public String call(Row row) throws Exception {
                return "string:>" + row.getString(0).toString() + "<";
            }
        }), encoder).collectAsList();
    
    // from java 1.8 onwards
    List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList();
    System.out.println(">>> " + rowsList);
    System.out.println(">>> " + rowsList1);
    

    }`

    0 讨论(0)
  • 2021-01-01 02:29

    Please check your input file's data and your dataframe sql query same thing I am facing and when I look back to the data so it was not matching with my query. So probably same issue you are facing. toJavaRDD and JavaRDD both are working.

    0 讨论(0)
  • 2021-01-01 02:41

    Do you have the correct dependency set in your pom. Set this and try

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>1.3.1</version>
        </dependency>
    
    0 讨论(0)
  • 2021-01-01 02:42

    try this:

    // SQL can be run over RDDs that have been registered as tables.
    DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
    
    List<String> teenagerNames = teenagers.toJavaRDD().map(
            new Function<Row, String>() {
          public String call(Row row) {
            return "Name: " + row.getString(0);
          }
        }).collect();
    

    you have to transforme your DataFrame to javaRDD

    0 讨论(0)
  • 2021-01-01 02:44

    Change this to:

    Java 6 & 7

    List<String> teenagerNames = teenagers.javaRDD().map(
        new Function<Row, String>() {
        public String call(Row row) {
            return "Name: " + row.getString(0);
        }
    }).collect();
    

    Java 8

    List<String> t2 = teenagers.javaRDD().map(
        row -> "Name: " + row.getString(0)
    ).collect();
    

    Once you call javaRDD() it works just like any other RDD map function.

    This works with Spark 1.3.0 and up.

    0 讨论(0)
  • 2021-01-01 02:48

    check if you are using the correct import for

    Row(import org.apache.spark.sql.Row) Remove any other imports related to Row.otherwise ur syntax is correct

    0 讨论(0)
提交回复
热议问题