dataframe: how to groupBy/count then filter on count in Scala

后端 未结 3 1327
野的像风
野的像风 2021-02-03 18:29

Spark 1.4.1

I encounter a situation where grouping by a dataframe, then counting and filtering on the \'count\' column raises the exception below

import         


        
3条回答
  •  太阳男子
    2021-02-03 18:49

    When you pass a string to the filter function, the string is interpreted as SQL. Count is a SQL keyword and using count as a variable confuses the parser. This is a small bug (you can file a JIRA ticket if you want to).

    You can easily avoid this by using a column expression instead of a String:

    df.groupBy("x").count()
      .filter($"count" >= 2)
      .show()
    

提交回复
热议问题