Spark 1.4.1
I encounter a situation where grouping by a dataframe, then counting and filtering on the \'count\' column raises the exception below
import
When you pass a string to the filter
function, the string is interpreted as SQL. Count is a SQL keyword and using count
as a variable confuses the parser. This is a small bug (you can file a JIRA ticket if you want to).
You can easily avoid this by using a column expression instead of a String:
df.groupBy("x").count()
.filter($"count" >= 2)
.show()