Convert Row to map in spark scala

前端 未结 4 935
傲寒
傲寒 2021-02-04 18:23

I have a row from a data frame and I want to convert it to a Map[String, Any] that maps column names to the values in the row for that column.

Is there an easy way to do

相关标签:
4条回答
  • 2021-02-04 18:36

    Let's say you have a row without structure information and the column header as an array.

    val rdd = sc.parallelize( Seq(Row("test1", "val1"), Row("test2", "val2"), Row("test3", "val3"), Row("test4", "val4")) )
    rdd.collect.foreach(println)
    
    val sparkFieldNames = Array("col1", "col2")
    
    val mapRDD = rdd.map(
      r => sparkFieldNames.zip(r.toSeq).toMap
    )
    
    mapRDD.collect.foreach(println)
    
    0 讨论(0)
  • 2021-02-04 18:48

    You can use getValuesMap:

    val df = Seq((1, 2.0, "a")).toDF("A", "B", "C")    
    val row = df.first
    

    To get Map[String, Any]:

    row.getValuesMap[Any](row.schema.fieldNames)
    // res19: Map[String,Any] = Map(A -> 1, B -> 2.0, C -> a)
    

    Or you can get Map[String, AnyVal] for this simple case since the values are not complex objects

    row.getValuesMap[AnyVal](row.schema.fieldNames)
    // res20: Map[String,AnyVal] = Map(A -> 1, B -> 2.0, C -> a)
    

    Note: the returned value type of the getValuesMap can be labelled as any type, so you can not rely on it to figure out what data types you have but need to keep in mind what you have from the beginning instead.

    0 讨论(0)
  • 2021-02-04 18:51

    Let's say you have a data Frame with these columns:

    [time(TimeStampType), col1(DoubleType), col2(DoubleType)]

    You can do something like this:

    val modifiedDf = df.map{row => 
        val doubleObject = row.getValuesMap(Seq("col1","col2"))
        val timeObject = Map("time" -> row.getAs[TimeStamp]("time"))
        val map = doubleObject ++ timeObject
    }
    
    0 讨论(0)
  • 2021-02-04 18:53

    You can convert your dataframe to rdd and use simple map function and use headernames in the MAP formation inside map function and finally use collect

    val fn = df.schema.fieldNames
    val maps = df.rdd.map(row => fn.map(field => field -> row.getAs(field)).toMap).collect()
    
    0 讨论(0)
提交回复
热议问题