dataframe: how to groupBy/count then filter on count in Scala

后端 未结 3 1334
野的像风
野的像风 2021-02-03 18:29

Spark 1.4.1

I encounter a situation where grouping by a dataframe, then counting and filtering on the \'count\' column raises the exception below

import         


        
3条回答
  •  情深已故
    2021-02-03 18:59

    So, is that a behavior to expect, a bug

    Truth be told I am not sure. It looks like parser is interpreting count not as a column name but a function and expects following parentheses. Looks like a bug or at least a serious limitation of the parser.

    is there a canonical way to go around?

    Some options have been already mentioned by Herman and mattinbits so here more SQLish approach from me:

    import org.apache.spark.sql.functions.count
    
    df.groupBy("x").agg(count("*").alias("cnt")).where($"cnt"  > 2)
    

提交回复
热议问题