In my application I have a need to create a single-row DataFrame from a Map.
So that a Map like
(\"col1\" -> 5, \"col2\" -> 10, \"col3\" ->
I thought that sorting the column names doesn't hurt anyway.
import org.apache.spark.sql.types._
val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
val (keys, values) = map.toList.sortBy(_._1).unzip
val rows = spark.sparkContext.parallelize(Seq(Row(values: _*)))
val schema = StructType(keys.map(
k => StructField(k, IntegerType, nullable = false)))
val df = spark.createDataFrame(rows, schema)
df.show()
Gives:
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 5| 6| 10|
+----+----+----+
The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.List
s and kitchen sinks, but doesn't accept the usual scala List
for some reason).