In cassandra, it\'s well known that secondary indexes should be used very sparingly.
If I have a table for example:
User(username, usertype, email, etc..
For this example, let's say I create a table to keep track of crew members by ship and id:
CREATE TABLE crewByShip (
ship text,
id int,
firstname text,
lastname text,
gender text,
PRIMARY KEY(ship,id));
And I'll create an index on gender
:
CREATE INDEX crewByShipG_idx ON crewByShip(gender);
After inserting some data, my table looks like this:
ship | id | firstname | gender | lastname
----------+----+-----------+--------+-----------
Serenity | 1 | Hoban | M | Washburne
Serenity | 2 | Zoey | F | Washburne
Serenity | 3 | Malcolm | M | Reynolds
Serenity | 4 | Kaylee | F | Frye
Serenity | 5 | Sheppard | M | Book
Serenity | 6 | Jayne | M | Cobb
Serenity | 7 | Simon | M | Tam
Serenity | 8 | River | F | Tam
Serenity | 9 | Inara | F | Serra
Now I'll turn tracing on, and query a distinct row with the PRIMARY KEY, but also restrict by our index on gender
.
aploetz@cqlsh:stackoverflow2> tracing on;
aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';
ship | id | firstname | gender | lastname
----------+----+-----------+--------+----------
Serenity | 3 | Malcolm | M | Reynolds
(1 rows)
Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d
activity | timestamp | source | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 | 0
Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 | 3715
Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 | 4846
Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 | 5730
Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 | 5757
Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 | 5793
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 | 5848
Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 | 5856
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 | 7056
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 | 7080
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 | 7143
Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 | 7578
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 | 7742
Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 | 7807
Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 | 7851
Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 | 10848
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 | 10936
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 | 11007
Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 | 11130
Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 | 11139
Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 | 11155
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 | 11253
Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 | 11262
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 | 11281
Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 | 11369
Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 | 11375
Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 | 11383
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 | 11409
Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 | 11415
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 | 11430
Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 | 11490
Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 | 13679
Now, I'll re-run the same query, but without the superfluous index on gender
.
aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;
ship | id | firstname | gender | lastname
----------+----+-----------+--------+----------
Serenity | 3 | Malcolm | M | Reynolds
(1 rows)
Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 | 0
Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 | 87
Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 | 246
Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 | 1185
Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 | 1197
Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 | 1215
Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 | 1249
Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 | 1278
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 | 3309
Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 | 3333
Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 | 3368
Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 | 4607
Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 | 4633
Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 | 4643
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 | 4678
Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 | 4683
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 | 4697
Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 | 5676
As you can see, the "source_elapsed" for the query with the secondary index was more than twice what it was for the same query (which returned the same row) without the index.
I think we can definitely say that using a secondary index on a low-cardinality column in a wide-row table will not perform well. Now while I won't say that filtering client-side is a good idea, in this case, with a small result set, it probably would be the better option.
The good option here would be to create a composite primary key consisting of username and usertype with username being partition key and usertype a cluster key. You will not even need an index and the query will work.
CREATE TABLE users (
username text,
usertype text,
....
PRIMARY KEY ((username), usertype)
)