Transpose DataFrame Without Aggregation in Spark with scala

问题

I looked number different solutions online, but count not find what I am trying to achine. Please help me on this.

I am using Apache Spark 2.1.0 with Scala. Below is my dataframe:


+-----------+-------+
|COLUMN_NAME| VALUE |
+-----------+-------+
|col1       | val1  |
|col2       | val2  |
|col3       | val3  |
|col4       | val4  |
|col5       | val5  |
+-----------+-------+

I want this to be transpose to, as below:


+-----+-------+-----+------+-----+
|col1 | col2  |col3 | col4 |col5 |
+-----+-------+-----+------+-----+
|val1 | val2  |val3 | val4 |val5 |
+-----+-------+-----+------+-----+

回答1:

If your dataframe is small enough as in the question, then you can collect COLUMN_NAME to form schema and collect VALUE to form the rows and then create a new dataframe as

import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
//creating schema from existing dataframe
val schema = StructType(df.select(collect_list("COLUMN_NAME")).first().getAs[Seq[String]](0).map(x => StructField(x, StringType)))
//creating RDD[Row] 
val values = sc.parallelize(Seq(Row.fromSeq(df.select(collect_list("VALUE")).first().getAs[Seq[String]](0))))
//new dataframe creation
sqlContext.createDataFrame(values, schema).show(false)

which should give you

+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
|val1|val2|val3|val4|val5|
+----+----+----+----+----+

回答2:

You can do this using pivot, but you still need aggregation but what if you have multiple value for a COLUMN_NAME?

val df = Seq(
  ("col1", "val1"),
  ("col2", "val2"),
  ("col3", "val3"),
  ("col4", "val4"),
  ("col5", "val5")
).toDF("COLUMN_NAME", "VALUE")

df
  .groupBy()
  .pivot("COLUMN_NAME").agg(first("VALUE"))
  .show()

+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
|val1|val2|val3|val4|val5|
+----+----+----+----+----+

EDIT:

if your dataframe is really that small as in your example, you can collect it as Map:

val map = df.as[(String,String)].collect().toMap

and then apply this answer

来源：https://stackoverflow.com/questions/49392683/transpose-dataframe-without-aggregation-in-spark-with-scala

标签

scala

apache-spark

dataframe

transpose