copy current row , modify it and add a new row in spark

耗尽温柔 提交于 2020-07-30 04:25:55

问题


I am using spark-sql-2.4.1v with java8 version. I have a scenario where I need to copy current row and create another row modifying few columns data how can this be achieved in spark-sql ?

Ex : Given

 val data = List(
  ("20", "score", "school",  14 ,12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 ,14)
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

Current Output

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school|    14|    12|
| 21|score|school|    13|    13|
| 22| rate|school|    11|    14|
+---+-----+------+------+------+

When column "code" is "rate" copy it as two rows i.e. one is original , second it is another row with new code "old_ rate" like below

Expected output :

+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+

how to achieve this ?


回答1:


Use when to check code === rate, if it is matched then replace that column value with array(lit("rate"),lit("new_rate")) & not matched column values array($"code") then explode code column.

Check below code.

scala> df.show(false)
+---+-----+------+------+------+
|id |code |entity|value1|value2|
+---+-----+------+------+------+
|20 |score|school|14    |12    |
|21 |score|school|13    |13    |
|22 |rate |school|11    |14    |
+---+-----+------+------+------+
val colExpr = explode(
    when(
        $"code" === "rate",
        array(
            lit("rate"),
            lit("new_rate")
        )
    )
    .otherwise(array($"code"))
)
scala> df.withColumn("code",colExpr).show(false)
+---+--------+------+------+------+
|id |code    |entity|value1|value2|
+---+--------+------+------+------+
|20 |score   |school|14    |12    |
|21 |score   |school|13    |13    |
|22 |rate    |school|11    |14    |
|22 |new_rate|school|11    |14    |
+---+--------+------+------+------+



回答2:


you can use this approach for your scenario,

df.union(df.filter($"code"==="rate").withColumn("code",concat(lit("new_"), $"code"))).show()
/*
+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+
*/


来源:https://stackoverflow.com/questions/63074569/copy-current-row-modify-it-and-add-a-new-row-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!