Spark Datastax Java API Select statements

前端未结

关注

 2  984

I\'m using a tutorial here in this Github to run spark on cassandra using a java maven project: https://github.com/datastax/spark-cassandra-connector.

I\'ve figured

相关标签:

2条回答

南笙

2020-12-11 21:06

The where method adds ALLOW FILTERING to your query under the covers. This is not a magic bullet, as it still doesn't support arbitrary fields as query predicates. In general, the field must either be indexed or a clustering column. If this isn't practical for your data model, you can simply use the filter method on the RDD. The downside is that the filter takes place in Spark and not in Cassandra.

So the id field works because it's supported in a CQL WHERE clause, whereas I'm assuming role is just a regular field. Please note that I am NOT suggesting that you index your field or change it to a clustering column, as I don't know your data model.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-11 21:07

There is a limitation in the Spark Cassandra Connector that the where method will not work on partitioning keys. In your table empByRole, role is a partitioning key, hence the error. It should work correctly on clustering columns or indexed columns (secondary indexes).

This is being tracked as issue 37 in the GitHub project and work has been ongoing.

On the Java API doc page, the examples shown used .where("name=?", "Anna"). I assume that name is not a partitioning key, but the example could be more clear about that.

0 讨论(0)
发布评论:

提交评论
- 加载中...