Validating row at client side better than secondary index with whole primary key?

前端 未结 2 551
我寻月下人不归
我寻月下人不归 2021-01-23 22:55

In cassandra, it\'s well known that secondary indexes should be used very sparingly.

If I have a table for example:

User(username, usertype, email, etc..         


        
相关标签:
2条回答
  • 2021-01-23 23:37

    For this example, let's say I create a table to keep track of crew members by ship and id:

    CREATE TABLE crewByShip (
      ship text,
      id int,
      firstname text,
      lastname text,
      gender text,
      PRIMARY KEY(ship,id));
    

    And I'll create an index on gender:

    CREATE INDEX crewByShipG_idx ON crewByShip(gender);
    

    After inserting some data, my table looks like this:

     ship     | id | firstname | gender | lastname
    ----------+----+-----------+--------+-----------
     Serenity |  1 |     Hoban |      M | Washburne
     Serenity |  2 |      Zoey |      F | Washburne
     Serenity |  3 |   Malcolm |      M |  Reynolds
     Serenity |  4 |    Kaylee |      F |      Frye
     Serenity |  5 |  Sheppard |      M |      Book
     Serenity |  6 |     Jayne |      M |      Cobb
     Serenity |  7 |     Simon |      M |       Tam
     Serenity |  8 |     River |      F |       Tam
     Serenity |  9 |     Inara |      F |     Serra
    

    Now I'll turn tracing on, and query a distinct row with the PRIMARY KEY, but also restrict by our index on gender.

    aploetz@cqlsh:stackoverflow2> tracing on;
    aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M';
    
     ship     | id | firstname | gender | lastname
    ----------+----+-----------+--------+----------
     Serenity |  3 |   Malcolm |      M | Reynolds
    
    (1 rows)
    
    Tracing session: 34ea1840-e8e1-11e4-9cb7-21b264d4c94d
    
     activity                                                                                                                                                                                                                                                                                                       | timestamp                  | source         | source_elapsed
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                                                                                                                                                                                                                                 Execute CQL3 query | 2015-04-22 06:17:48.102000 | 192.168.23.129 |              0
                                                                                                                                                                                                              Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3 AND gender='M'; [SharedPool-Worker-1] | 2015-04-22 06:17:48.114000 | 192.168.23.129 |           3715
                                                                                                                                                                                                                                                                          Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:48.116000 | 192.168.23.129 |           4846
                                                                                                                                                                                                                                                    Executing single-partition query on users [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5730
                                                                                                                                                                                                                                                                 Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.118000 | 192.168.23.129 |           5757
                                                                                                                                                                                                                                                                  Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5793
                                                                                                                                                                                                                                                                  Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.119000 | 192.168.23.129 |           5848
                                                                                                                                                                                                                                                  Seeking to partition beginning in data file [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           5856
                                                                                                                                                                                                                    Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.120000 | 192.168.23.129 |           7056
                                                                                                                                                                                                                                                   Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.121000 | 192.168.23.129 |           7080
                                                                                                                                                                                                                                                           Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7143
                                                                                                                                                                                                                                                                    Computing ranges to query [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7578
     Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7742
                                                                                                                                                                                                  Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7807
                                                                                                                                                                                                                                      Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-04-22 06:17:48.122000 | 192.168.23.129 |           7851
                                                                                                                                                                                                                                              Executing indexed scan for [Serenity, Serenity] [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10848
     Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          10936
     Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=gender, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=crewbyshipg_idx, indexType=COMPOSITES}]}:0. Scanning with crewbyship.crewbyshipg_idx. [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11007
                                                                                                                                                                                                                               Executing single-partition query on crewbyship.crewbyshipg_idx [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11130
                                                                                                                                                                                                                                                                 Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.123000 | 192.168.23.129 |          11139
                                                                                                                                                                                                                                                                  Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11155
                                                                                                                                                                                                                    Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11253
                                                                                                                                                                                                                                                   Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.124000 | 192.168.23.129 |          11262
                                                                                                                                                                                                                                                           Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.127000 | 192.168.23.129 |          11281
                                                                                                                                                                                                                                               Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:48.130000 | 192.168.23.129 |          11369
                                                                                                                                                                                                                                                                 Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11375
                                                                                                                                                                                                                                                                  Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.131000 | 192.168.23.129 |          11383
                                                                                                                                                                                                                    Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:48.133000 | 192.168.23.129 |          11409
                                                                                                                                                                                                                                                   Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:48.134000 | 192.168.23.129 |          11415
                                                                                                                                                                                                                                                           Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11430
                                                                                                                                                                                                                                                                 Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-04-22 06:17:48.138000 | 192.168.23.129 |          11490
                                                                                                                                                                                                                                                                                                   Request complete | 2015-04-22 06:17:48.115679 | 192.168.23.129 |          13679
    

    Now, I'll re-run the same query, but without the superfluous index on gender.

    aploetz@cqlsh:stackoverflow2> SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3;
    
     ship     | id | firstname | gender | lastname
    ----------+----+-----------+--------+----------
     Serenity |  3 |   Malcolm |      M | Reynolds
    
    (1 rows)
    
    Tracing session: 38d7f440-e8e1-11e4-9cb7-21b264d4c94d
    
     activity                                                                                        | timestamp                  | source         | source_elapsed
    -------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                  Execute CQL3 query | 2015-04-22 06:17:54.692000 | 192.168.23.129 |              0
              Parsing SELECT * FROM crewByShip WHERE ship='Serenity' AND id=3; [SharedPool-Worker-1] | 2015-04-22 06:17:54.695000 | 192.168.23.129 |             87
                                                           Preparing statement [SharedPool-Worker-1] | 2015-04-22 06:17:54.696000 | 192.168.23.129 |            246
                                     Executing single-partition query on users [SharedPool-Worker-3] | 2015-04-22 06:17:54.697000 | 192.168.23.129 |           1185
                                                  Acquiring sstable references [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1197
                                                   Merging memtable tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.698000 | 192.168.23.129 |           1215
                                                   Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1249
                                   Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-04-22 06:17:54.700000 | 192.168.23.129 |           1278
     Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3309
                                    Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-04-22 06:17:54.701000 | 192.168.23.129 |           3333
                                            Read 1 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           3368
                                Executing single-partition query on crewbyship [SharedPool-Worker-2] | 2015-04-22 06:17:54.702000 | 192.168.23.129 |           4607
                                                  Acquiring sstable references [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4633
                                                   Merging memtable tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.704000 | 192.168.23.129 |           4643
     Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4678
                                    Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-04-22 06:17:54.705000 | 192.168.23.129 |           4683
                                            Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-04-22 06:17:54.706000 | 192.168.23.129 |           4697
                                                                                    Request complete | 2015-04-22 06:17:54.697676 | 192.168.23.129 |           5676
    

    As you can see, the "source_elapsed" for the query with the secondary index was more than twice what it was for the same query (which returned the same row) without the index.

    I think we can definitely say that using a secondary index on a low-cardinality column in a wide-row table will not perform well. Now while I won't say that filtering client-side is a good idea, in this case, with a small result set, it probably would be the better option.

    0 讨论(0)
  • 2021-01-23 23:40

    The good option here would be to create a composite primary key consisting of username and usertype with username being partition key and usertype a cluster key. You will not even need an index and the query will work.

    CREATE TABLE users (
      username text,
      usertype text,
       ....
      PRIMARY KEY ((username), usertype)
    )
    
    0 讨论(0)
提交回复
热议问题