How to explode columns?

后端未结

关注

 4  885

After:

val df = Seq((1, Vector(2, 3, 4)), (1, Vector(2, 3, 4))).toDF(\"Col1\", \"Col2\")

I have this DataFrame in Apache Spark:

相关标签:

4条回答

2020-12-25 15:08

You can use a map:

df.map {
    case Row(col1: Int, col2: mutable.WrappedArray[Int]) => (col1, col2(0), col2(1), col2(2))
}.toDF("Col1", "Col2", "Col3", "Col4").show()

0 讨论(0)

2020-12-25 15:23

A solution that doesn't convert to and from RDD:

df.select($"Col1", $"Col2"(0) as "Col2", $"Col2"(1) as "Col3", $"Col2"(2) as "Col3")

Or arguable nicer:

val nElements = 3
df.select(($"Col1" +: Range(0, nElements).map(idx => $"Col2"(idx) as "Col" + (idx + 2)):_*))

The size of a Spark array column is not fixed, you could for instance have:

+----+------------+
|Col1|        Col2|
+----+------------+
|   1|   [2, 3, 4]|
|   1|[2, 3, 4, 5]|
+----+------------+

So there is no way to get the amount of columns and create those. If you know the size is always the same, you can set nElements like this:

val nElements = df.select("Col2").first.getList(0).size

0 讨论(0)