问题
I am implementing a feature which requires looking up Cassandra by a list of primary keys.
Below is an example data where id is primary key
mytable
id column1
1 423
2 542
3 678
4 45534
5 435634
6 2435
7 678
8 4564
9 546
Most of my queries a lookup by id, but for some special cases I would like to get data for a list of ids. The way I am currently doing is a follows:
public Object fetchFromCassandraForId(int id);
int ids[] = {1, 3, 5, 7, 9};
List<Object> results;
for(int id: ids) {
results.add(fetchFromCassandraForId(id));
}
This results in issuing multiple network call to cassandra, Is it possible to batch this somehow, therefore i would like to know if cassandra supports fast lookup by list of ids
select coulmn1 from mytable where id in (1, 3, 5, 7, 9);
? Any help or pointers would be appreciated?
回答1:
If the id
is the full primary key, then Cassandra supports this, although it's not recommended from performance point of view:
- request is sent to coordinator node
- coordinator node finds a replica for each of the
id
, and send individual request to them - wait for results from every node, collect them to result set & send back
As result:
- all your sub-queries need to wait for slowest of the replicas
- you have an additional network hope from coordinator to replica
- you put more pressure to the coordinator node as it need to keep results in memory
If you do a lot of parallel, asynchronous requests for each of the id
values from application, then you:
- avoid an additional hop - if you're using prepared statements with token-aware load balancing, then query is sent directly to replicas
- you may start to process results as you get them, not waiting for everything
So sending parallel asynchronous requests could be faster than sending one request with IN
...
来源:https://stackoverflow.com/questions/62643342/cassandra-lookup-by-list-of-primary-keys-in-java