How to pass column names in selectExpr through one or more string parameters in spark using scala?

前端 未结 1 875
旧巷少年郎
旧巷少年郎 2021-01-25 01:35

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the c

相关标签:
1条回答
  • 2021-01-25 01:48

    Something like this should work:

    import org.apache.spark.sql.{DataFrame, functions => sqlfun}
    
    def foo(microBatchOutputDF: DataFrame)
           (keyCols: Seq[String], structCols: Seq[String]): DataFrame =
      microBatchOutputDF
        .selectExpr((keyCols ++ structCols) : _*)
        .groupBy(keyCols.head, keyCols.tail : _*).agg(sqlfun.max("otherCols").as("latest"))
        .selectExpr((keyCols :+ "latest.*") : _*)
    

    Which you can use like:

    foo(microBatchOutputDF)(keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols"))
    
    0 讨论(0)
提交回复
热议问题