Retrieve distinct values from the hash key - DynamoDB

后端 未结 1 1063
轻奢々
轻奢々 2020-12-20 14:03

I have a dynamodb table to store email attribute information. I have a hash key on the email, range key on timestamp(number). The initial idea for using email as hash key is

相关标签:
1条回答
  • 2020-12-20 14:28

    Using a DynamoDB Scan, you would need to filter out duplicates on the client side (in your case, using boto). Even if you create a GSI with the reverse schema, you will still get duplicates. Given a H+R table of email_id+timestamp called stamped_emails, a list of all unique email_ids is a materialized view of the H+R stamped_emails table. You could enable a DynamoDB Stream on the stamped_emails table, subscribe a Lambda function to stamped_emails' Stream that does a PutItem (email_id) to a Hash-only table called emails_only. Then, you could Scan emails_only and you would get no duplicates.

    Finally, regarding your question about cost, Scan will read entire items even if you only request certain projected attributes from those items. Second, Scan has to read through every item, even if it is filtered out by a FilterExpression (Condition Expression). Third, Scan reads through items sequentially. That means that each scan call is treated as one big read for metering purposes. The cost implication of this is that if a Scan call reads 200 different items, it will not necessarily cost 100 RCU. If the size of each of those items is 100 bytes, that Scan call will cost ROUND_UP((20000 bytes / 1024 kb/byte) / 8 kb / EC RCU) = 3 RCU. Even if this call only returns 123 items, if the Scan had to read 200 items, you would incur 3 RCU in this situation.

    0 讨论(0)
提交回复
热议问题