Fastest way of querying for latest items in a Azure table?

后端 未结 4 1768
无人及你
无人及你 2020-12-09 11:03

I have a Azure table where customers post messages, there may be millions of messages in a single table. I want to find the fastest way of getting the messages posted within

相关标签:
4条回答
  • 2020-12-09 11:47

    The Primary key for Table is the combination of PartitionKey and RowKey(which forms a clustered index).

    In your case, just go for RowKey instead of ParitionKey(provide a constant value for this).

    You can also follow the Diagnostics approach, like for every ten minutes create a new Partition Key. But this approach is mainly for requirements like Archieving/Purging etc.,

    0 讨论(0)
  • 2020-12-09 11:53
    • From my understanding using partition key with exact equal "=" will be much faster than less than using "<" or "greater than ">.
    • Also make sure to put more efforts if we can get the unique combination of partition key and row key for your condition.
    • Also make sure that you do less unique combinations of partition keys values to avoid more partitions.
    0 讨论(0)
  • 2020-12-09 11:57

    I would suggest doing something similar to what Diagnostics API is doing with WADPerformanceCountersTable. There PartitionKey groups a number of timestamps into a single item. Ie: it rounds all timestamps into nearest few minutes (say, nearest 5 minutes). This way you do not have a limited amount of partition keys and yet are still able to do ranged queries on them.

    So, for example, you can have a PartitionKey that maps to each timestamp that is rounded into 00:00, 00:05, 00:10, 00:15, etc.. and then converted to Ticks

    0 讨论(0)
  • 2020-12-09 12:09

    I think you've got the right basic idea. The query you've designed should be about as efficient as you could hope for. But there are some improvements I could offer.

    Rather than using DateTime.Now, use Date.UtcNow. From what I understand instances are set to use Utc time as their base anyway, but this just makes sure you're comparing apples with apples and you can reliable convert the time back into whatever timezone you want when displaying them.

    Rather than storing the time as .ToString("o") turn the time into ticks and store that, you'll end up with less formatting problems (sometimes you'll get the timezone specification at the end, sometimes not). Also if you always want to see these messages sorted from most recent to oldest you can subtract the number of ticks from the max number of ticks e.g.

    var messagePartitionKey = (DateTime.MaxValue.Ticks - _contactDate.Ticks).ToString("d19");
    

    It would also be a good idea to specify a row key. While it is highly unlikely that two messages will be posted with exactly the same time, it's not impossible. If you don't have an obvious row key, then just set it to be a Guid.

    0 讨论(0)
提交回复
热议问题