how to filter cassandra query by a field in user defined type

回眸只為那壹抹淺笑 提交于 2019-11-28 13:47:06

Short answer: you can use secondary indexes to query by fullname UDT. But you cannot query by only a part of your UDT.

// create table, type and index
create type fullname ( firstname text, lastname text );
create table people ( id UUID primary key, name frozen <fullname> );
create index fname_index on your_keyspace.people (name);

// insert some data into it
insert into people (id, name) values (now(), {firstname: 'foo', lastname: 'bar'});
insert into people (id, name) values (now(), {firstname: 'baz', lastname: 'qux'});

// query it by fullname
select * from people where name = { firstname: 'baz', lastname: 'qux' };

// the following will NOT work:
select * from people where name = { firstname: 'baz'};

The reason for such behaviour is a way C* secondary indexes are implemented. In general, it's just another hidden table maintained by C*, in your case defined as:

create table fname_index (name frozen <fullname> primary key, id uuid);

Actually your secondary and primary keys are swapped in this table. So your case is reduced to a more general question 'why can't I query by only a part of PK?':

  • the whole PK value (firstname+lastname) is hashed, the resulting number defines the partition to store your row.
  • for that partition your row is appended to a memtable (and later flushed on disk to SSTable, a file sorted by key)
  • when you want to query only by part of PK (like by firstname only), C* doesn't able to guess the partition to look for (as it doesn't able to compute the hashcode for the whole fullname as lastname is unknown), as your match can be anywhere in any partition requiring full-table scan. C* explicitly forbids these scans, so you have no choice :)

Suggested solutions:

  • split your UDT to essential parts like firstname and lastname and have secondary indexes on it.
  • use Cassandra 3.0 with materialized views feature (actually force cassandra to maintain a custom index for part of your UDT)
  • revisit your data model to be less strict (when no one forces you to use UDTs where they are not helpful)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!