Spark 1.4.1
I encounter a situation where grouping by a dataframe, then counting and filtering on the \'count\' column raises the exception below
import
So, is that a behavior to expect, a bug
Truth be told I am not sure. It looks like parser is interpreting count
not as a column name but a function and expects following parentheses. Looks like a bug or at least a serious limitation of the parser.
is there a canonical way to go around?
Some options have been already mentioned by Herman and mattinbits so here more SQLish approach from me:
import org.apache.spark.sql.functions.count
df.groupBy("x").agg(count("*").alias("cnt")).where($"cnt" > 2)