Difference between local and global indexes in DynamoDB

前端 未结 7 1973
轻奢々
轻奢々 2020-12-22 15:42

I\'m curious about these two secondary indexes and differences between them. It is hard to imagine how this looks like. And I think, this will help more people than just me.

相关标签:
7条回答
  • 2020-12-22 16:13

    This documentaion gives pretty good explanation :

    https://aws.amazon.com/blogs/aws/now-available-global-secondary-indexes-for-amazon-dynamodb/

    I could not comment on this Question ,but which is better in terms of write and read performance :

    (Local Index with Table read and write throughput of 100) or (Global index with read /write throughput of 50 along with table's read/write throughput of 50 ? )

    I do not need separate partition key for my use case , so local index should be sufficient for the required functionality.

    0 讨论(0)
  • 2020-12-22 16:15

    Another way to explain: LSI helps you do additional queries on items with same Hash Key. GSI helps you do the similar queries on items "across the table". So very useful.

    If you have a user profile table: unique-id, name, email. Here if you need to make the table queryable on name, email - then the only way is to make them GSI (LSI wont help)

    0 讨论(0)
  • 2020-12-22 16:15

    GSIs can't be used for consistent reads.

    LSIs can be used for consistent reads but they will limit the main partition size to 10GB. Also LSIs can only be created on table creation.

    0 讨论(0)
  • 2020-12-22 16:26

    Local Secondary Indexes still rely on the original Hash Key. When you supply a table with hash+range, think about the LSI as hash+range1, hash+range2.. hash+range6. You get 5 more range attributes to query on. Also, there is only one provisioned throughput.

    Global Secondary Indexes defines a new paradigm - different hash/range keys per index.
    This breaks the original usage of one hash key per table. This is also why when defining GSI you are required to add a provisioned throughput per index and pay for it.

    More detailed information about the differences can be found in the GSI announcement

    0 讨论(0)
  • 2020-12-22 16:26

    These are the possible searches by index:

    • By Hash
    • By Hash + Range
    • By Hash + Local Index
    • By Global index
    • By Global index + Range Index

    Hash and Range indexes of a table: These are the usual indexes of previous versions of the Amazon AWS SDK.

    Global and Local indexes: These are 'additional' indexes created on a table, in addition to existing hash and range indexes of the table. Global index is similar to a hash. Range index behave similarly to the range index used with the hash of the table. In you entity model in your code, the getter must be annotated in this way:

    • For global indexes:

      @DynamoDBIndexHashKey(globalSecondaryIndexName = INDEX_GLOBAL_RANGE_US_TS)
      @DynamoDBAttribute(attributeName = PROPERTY_USER)
      public String getUser() {
          return user;
      }
      
    • For range index associated to the global index:

      @DynamoDBIndexRangeKey(globalSecondaryIndexName = INDEX_GLOBAL_RANGE_US_TS)
      @DynamoDBAttribute(attributeName = PROPERTY_TIMESTAMP)
      public String getTimestamp() {
          return timestamp;
      }
      

    Besides, if you read a table by a Global index, it must be an Eventual read (not Consistent read):

    queryExpression.setConsistentRead(false);
    
    0 讨论(0)
  • 2020-12-22 16:26

    One way to put it is this:

    LSI - allows you to perform a query on a single Hash-Key while using multiple different attributes to "filter" or restrict the query.

    GSI - allows you to perform queries on multiple Hash-Keys in a table, but costs extra in throughput, as a result.

    A more extensive breakdown of the table types and how they work, below:

    Hash Only

    As you probably already know; a Hash-Key by itself must be unique as writing to a Hash-Key that already exists will overwrite the existing data.

    Hash+Range

    A Hash-Key + Range-Key allows you to have multiple Hash Keys that are the same, as long as they have a different range key. In this case, if you write to a Hash-Key that already exists, but use a Range-Key that is not already used by that Hash-Key, it makes a new item, whereas if an item with the same Hash+Range combination already exists, it overwrites the matching item.

    Another way to think of this is like a file with a format. You can have a file with the same name (hash) as another, in the same folder (table), as long as their format (range) is different. Likewise, you can have multiple files of the same format as long as their name is different.

    LSI

    An LSI is basically the same as a Hash-Key + Range-Key, and follows the same rules as it, when creating items, except that you must also provide values for the LSIs, as well; they cannot be left empty/null.

    To say an LSI is "Range-Key 2" is not entirely correct as you cannot have (using my file and format analogy from earlier) a file named: file.format.lsi and file.format.lsi2. You can, however, have file.format.lsi and file.format2.lsi or file.format.lsi and file2.format.lsi.

    Basically, an LSI is just a "Filter-key", not an actual Range-Key; your base Hash and Range value combination must still be unique while the LSI values do not have to be unique, at all. An easier way to look at it may be to think of the LSI as data within the files. You could write code that finds all the files with the name "PROJECT101", regardless of their fileFormat, then reads the data inside to determine what should be included in the query and what is omitted. This is basically how LSI works (just without the extra overhead of opening the file to read its contents).

    GSI

    For GSI, you're essentially creating another table for each GSI, but without the hassle of maintaining multiple separate tables that mirror data between them; this is why they cost more throughput.

    So for a GSI, you could specify fileName as your base Hash-Key, and fileFormat as your base Range-Key. You can then specify a GSI that has a Hash-Key of fileName2 and a Range-Key of fileFormat2. You can then query on either fileName or fileName2 if you like, unlike LSI where you can only query on fileName.

    The main advantages are that you only have to maintain one table, instead of 2, and anytime you write to either the primary Hash/Range or the GSI Hash/Range(s), the other(s) will automatically be updated as well, so you can't "forget" to update the other table(s) like you can with a multi-table setup. Also, there's no chance of a lost connection after updating one and before updating the other, like there is with the multi-table setup.

    Additionally, a GSI can "overlap" the base Hash/Range combination. So if you wanted to make a table with fileName and fileFormat as your base Hash/Range and filePriority and fileName as your GSI, you can.

    Lastly, a GSI Hash+Range combination does not have to be unique, while the base Hash+Range combination does have to be unique. This is something that is not possible with a dual/multi table setup, but is with GSI. As a result, you MUST provide values for both the base AND GSI Hash+Range, when updating; none of these values can be empty/null.

    0 讨论(0)
提交回复
热议问题