How to pass column names in selectExpr through one or more string parameters in spark using scala?

前端未结

关注

 1  884

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the c

相关标签:

1条回答

挽巷

2021-01-25 01:48

Something like this should work:

import org.apache.spark.sql.{DataFrame, functions => sqlfun}

def foo(microBatchOutputDF: DataFrame)
       (keyCols: Seq[String], structCols: Seq[String]): DataFrame =
  microBatchOutputDF
    .selectExpr((keyCols ++ structCols) : _*)
    .groupBy(keyCols.head, keyCols.tail : _*).agg(sqlfun.max("otherCols").as("latest"))
    .selectExpr((keyCols :+ "latest.*") : _*)

Which you can use like:

foo(microBatchOutputDF)(keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols"))

0 讨论(0)