Aggregate a Spark data frame using an array of column names, retaining the names

∥☆過路亽.° 提交于 2020-04-13 17:21:51

问题


I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.

df.groupBy($"id").sum(colNames:_*)

This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:

df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*))
error: no `: _*' annotation allowed here

It works to take a single element like

df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))

How can make this happen for the entire array?


回答1:


Just provide an sequence of columns with aliases:

val colNames: Seq[String] = ???
val exprs = colNames.map(c => sum(c).alias(c))
df.groupBy($"id").agg(exprs.head, exprs.tail: _*)


来源:https://stackoverflow.com/questions/39388307/aggregate-a-spark-data-frame-using-an-array-of-column-names-retaining-the-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!