Spark split a column value into multiple rows

后端 未结 1 2039
梦如初夏
梦如初夏 2021-01-15 22:38

My problem is I have a table like this:

------------------------
A  B    C
------------------------
a1 b2   c1|c2|c3|c4

c1|c2|c3|c4 is one

1条回答
  •  再見小時候
    2021-01-15 23:03

    This is what you could do, split the string with pipe and explode the data using spark function

    import org.apache.spark.sql.functions._
    import spark.implicits._
    
    val df = Seq(("a1", "b1", "c1|c2|c3|c4")).toDF("A", "B", "C")
    
    df.withColumn("C", explode(split($"C", "\\|"))).show
    

    Output:

    +---+---+---+
    |  A|  B|  C|
    +---+---+---+
    | a1| b1| c1|
    | a1| b1| c2|
    | a1| b1| c3|
    | a1| b1| c4|
    +---+---+---+
    

    Hope this helps!

    0 讨论(0)
提交回复
热议问题