I\'m using a tutorial here in this Github to run spark on cassandra using a java maven project: https://github.com/datastax/spark-cassandra-connector.
I\'ve figured
The where
method adds ALLOW FILTERING
to your query under the covers. This is not a magic bullet, as it still doesn't support arbitrary fields as query predicates. In general, the field must either be indexed or a clustering column. If this isn't practical for your data model, you can simply use the filter
method on the RDD. The downside is that the filter takes place in Spark and not in Cassandra.
So the id
field works because it's supported in a CQL WHERE
clause, whereas I'm assuming role is just a regular field. Please note that I am NOT suggesting that you index your field or change it to a clustering column, as I don't know your data model.
There is a limitation in the Spark Cassandra Connector that the where
method will not work on partitioning keys. In your table empByRole, role is a partitioning key, hence the error. It should work correctly on clustering columns or indexed columns (secondary indexes).
This is being tracked as issue 37 in the GitHub project and work has been ongoing.
On the Java API doc page, the examples shown used .where("name=?", "Anna")
. I assume that name is not a partitioning key, but the example could be more clear about that.