Validating row at client side better than secondary index with whole primary key?

◇◆丶佛笑我妖孽 提交于 2019-12-02 11:34:37

问题


In cassandra, it's well known that secondary indexes should be used very sparingly.

If I have a table for example:

User(username, usertype, email, etc..)

Here username is partition key. Now I want to support operation which returns a specific user(username will be given) if and only if usertype is a specific value X.

There are two ways I can do it:

One: Create a secondary index on usertype, possible values ('A', 'B', 'C') and username is partition key.

SELECT * FROM user WHERE username='something' AND usertype='A';

Two:

I can just fetch the row with username to client and then check if usertype is A.

Which approach is better? Please also consider a wide row(not so big, 10s) scenario where not all rows of a partition might have the given value (which requires some client side filtering).

What I'm not clear about secondary indexes is how data is looked up in a particular node.

Ex: SELECT * FROM user WHERE username='something' AND usertype='A'

For example usertype hidden CF has data 'A'-> 'jhon', 'miller', 'chris',...etc, 100 usernames

And the query with partition key is given along with usertype does it scan through all these 100 usernames to match with the username 'something' or does it just fetches by username first and sees the usertype column if it matches with 'A'? How exactly it does that searching? How does the query fares given the index is on low cardinality data and each one is mapped to many rows?

I'm using java as client if that matters.

Update: I understand that I can use clustering (usertype) key for this particular example but I wanted to know the trade off I've asked. My original tables are much more complex.


回答1:


For this example, let's say I create a table to keep track of crew members by ship and id:

CREATE TABLE crewByShip (
  ship text,
  id int,
  firstname text,
  lastname text,
  gender text,
  PRIMARY KEY(ship,id));

And I'll create an index on gender:

CREATE INDEX crewByShipG_idx ON crewByShip(gender);

After inserting some data, my table looks like this:

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+-----------
 Serenity |  1 |     Hoban |      M | Washburne
 Serenity |  2 |      Zoey |      F | Washburne
 Serenity |  3 |   Malcolm |      M |  Reynolds
 Serenity |  4 |    Kaylee |      F |      Frye
 Serenity |  5 |  Sheppard |      M |      Book
 Serenity |  6 |     Jayne |      M |      Cobb
 Serenity |  7 |     Simon |      M |       Tam
 Serenity |  8 |     River |      F |       Tam
 Serenity |  9 |     Inara |      F |     Serra

Now I'll turn tracing on, and query a distinct row with the PRIMARY KEY, but also restrict by our index on gender.

aploetz@cqlsh:stackoverflow2> tracing on;
aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                                                                                                                                                                                                                                       | timestamp                  | source         | source_elapsed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                                                                                                                                                                                                                             Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 |              0
                                                                                                                                                                                                          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 |           3715
                                                                                                                                                                                                                                                                      Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 |           4846
                                                                                                                                                                                                                                                Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5730
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5757
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5793
                                                                                                                                                                                                                                                              Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5848
                                                                                                                                                                                                                                              Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           5856
                                                                                                                                                                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           7056
                                                                                                                                                                                                                                               Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 |           7080
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7143
                                                                                                                                                                                                                                                                Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7578
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7742
                                                                                                                                                                                              Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7807
                                                                                                                                                                                                                                  Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7851
                                                                                                                                                                                                                                          Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10848
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10936
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11007
                                                                                                                                                                                                                           Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11130
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11139
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11155
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11253
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11262
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 |          11281
                                                                                                                                                                                                                                           Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 |          11369
                                                                                                                                                                                                                                                             Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11375
                                                                                                                                                                                                                                                              Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11383
                                                                                                                                                                                                                Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 |          11409
                                                                                                                                                                                                                                               Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 |          11415
                                                                                                                                                                                                                                                       Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11430
                                                                                                                                                                                                                                                             Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11490
                                                                                                                                                                                                                                                                                               Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 |          13679

Now, I'll re-run the same query, but without the superfluous index on gender.

aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;

 ship     | id | firstname | gender | lastname
----------+----+-----------+--------+----------
 Serenity |  3 |   Malcolm |      M | Reynolds

(1 rows)

Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d

 activity                                                                                        | timestamp                  | source         | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                              Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 |              0
          Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 |             87
                                                       Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 |            246
                                 Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 |           1185
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1197
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1215
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1249
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1278
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3309
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3333
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           3368
                            Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           4607
                                              Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4633
                                               Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4643
 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4678
                                Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4683
                                        Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 |           4697
                                                                                Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 |           5676

As you can see, the "source_elapsed" for the query with the secondary index was more than twice what it was for the same query (which returned the same row) without the index.

I think we can definitely say that using a secondary index on a low-cardinality column in a wide-row table will not perform well. Now while I won't say that filtering client-side is a good idea, in this case, with a small result set, it probably would be the better option.




回答2:


The good option here would be to create a composite primary key consisting of username and usertype with username being partition key and usertype a cluster key. You will not even need an index and the query will work.

CREATE TABLE users (
  username text,
  usertype text,
   ....
  PRIMARY KEY ((username), usertype)
)


来源:https://stackoverflow.com/questions/29659564/validating-row-at-client-side-better-than-secondary-index-with-whole-primary-key

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!