Scala-Spark Dynamically call groupby and agg with parameter values

前端未结

关注

 1  1702

I want to write a custom grouping and aggregate function to get user specified column names and user specified aggregation map.I do not know the column names and agg

相关标签:

1条回答

灰色年华

2020-12-03 19:50

Your code is almost correct - with two issues:

The return type of your function is DataFrame, but the last line is aggregated.show(), which returns Unit. Remove the call to show to return aggregated itself, or just return the result of agg immediately

DataFrame.groupBy expects arguments as follows: col1: String, cols: String* - so you need to pass matching arguments: the first columns, and then the rest of the columns as a list of arguments, you can do that as follows: df.groupBy(cols.head, cols.tail: _*)

Altogether, your function would be:

def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame ={ val grouped = df.groupBy(cols.head, cols.tail: _*) val aggregated = grouped.agg(aggregateFun) aggregated }

Or, a similar shorter version:

def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame = { df.groupBy(cols.head, cols.tail: _*).agg(aggregateFun) }

If you do want to call show within your function:

def groupAndAggregate(df: DataFrame, aggregateFun: Map[String, String], cols: List[String] ): DataFrame ={ val grouped = df.groupBy(cols.head, cols.tail: _*) val aggregated = grouped.agg(aggregateFun) aggregated.show() aggregated }

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复