Spark - convert Map to a single-row DataFrame

前端 未结 3 1469
广开言路
广开言路 2021-01-14 02:05

In my application I have a need to create a single-row DataFrame from a Map.

So that a Map like

(\"col1\" -> 5, \"col2\" -> 10, \"col3\" ->          


        
相关标签:
3条回答
  • 2021-01-14 02:17

    here you go :

    val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
    
    val df = map.tail
      .foldLeft(Seq(map.head._2).toDF(map.head._1))((acc,curr) => acc.withColumn(curr._1,lit(curr._2)))
    
    
    df.show()
    
    +----+----+----+
    |col1|col2|col3|
    +----+----+----+
    |   5|   6|  10|
    +----+----+----+
    
    0 讨论(0)
  • 2021-01-14 02:17

    A slight variation to Rapheal's answer. You can create a dummy column DF (1*1), then add the map elements using foldLeft and then finally delete the dummy column. That way, your foldLeft is straight forward and easy to remember.

    val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
    
    val f = Seq("1").toDF("dummy")
    
    map.keys.toList.sorted.foldLeft(f) { (acc,x) => acc.withColumn(x,lit(map(x)) ) }.drop("dummy").show(false)
    
    +----+----+----+
    |col1|col2|col3|
    +----+----+----+
    |5   |6   |10  |
    +----+----+----+
    
    0 讨论(0)
  • 2021-01-14 02:26

    I thought that sorting the column names doesn't hurt anyway.

      import org.apache.spark.sql.types._
      val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
      val (keys, values) = map.toList.sortBy(_._1).unzip
      val rows = spark.sparkContext.parallelize(Seq(Row(values: _*)))
      val schema = StructType(keys.map(
        k => StructField(k, IntegerType, nullable = false)))
      val df = spark.createDataFrame(rows, schema)
      df.show()
    

    Gives:

    +----+----+----+
    |col1|col2|col3|
    +----+----+----+
    |   5|   6|  10|
    +----+----+----+
    

    The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.Lists and kitchen sinks, but doesn't accept the usual scala List for some reason).

    0 讨论(0)
提交回复
热议问题