Row aggregations in Scala

前端 未结 1 334
再見小時候
再見小時候 2021-01-28 07:24

I am looking for a way to get a new column in a data frame in Scala that calculates the min/max of the values in col1, col2,

相关标签:
1条回答
  • 2021-01-28 08:13

    Porting this Python answer by user6910411

    import org.apache.spark.sql.functions._
    
    val df = Seq(
      (1, 3, 0, 9, "a", "b", "c")
    ).toDF("col1", "col2", "col3", "col4", "col5", "col6", "Col7")
    
    val cols =  Seq("col1", "col2", "col3", "col4")
    
    val rowMax = greatest(
      cols map col: _*
    ).alias("max")
    
    val rowMin = least(
      cols map col: _*
    ).alias("min")
    
    df.select($"*", rowMin, rowMax).show
    
    // +----+----+----+----+----+----+----+---+---+
    // |col1|col2|col3|col4|col5|col6|Col7|min|max|
    // +----+----+----+----+----+----+----+---+---+
    // |   1|   3|   0|   9|   a|   b|   c|0.0|9.0|
    // +----+----+----+----+----+----+----+---+---+
    
    0 讨论(0)
提交回复
热议问题