问题
I would like to optimise my Azure Cosmos DB SQL API queries for consumed RUs (in part in order to reduce the frequency of 429 responses).
Specifically I thought that including the partition key in WHERE clauses would decrease consumed RUs (e.g. I read https://docs.microsoft.com/en-us/azure/cosmos-db/optimize-cost-queries and https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview which made me think this).
However, when I run
SELECT TOP 1 *
FROM c
WHERE c.Field = "some value"
AND c.PartitionKeyField = "1234"
ORDER BY c.TimeStampField DESC
It consumes 6 RUs.
Whereas without the partition key, e.g.
SELECT TOP 1 *
FROM c
WHERE c.Field = "some value"
ORDER BY c.TimeStampField DESC
It consumes 5.76 RUs - i.e. cheaper.
(whilst there is some variation in the above numbers depending on the exact document selected, the second query is always cheaper, and I have tested against both the smallest and largest partitions.)
My database currently has around 400,000 documents and 29 partitions (both are expected to grow). Largest partition has around 150,000 documents (unlikely to grow further than this).
The above results indicate to me that I should not pass the partition key in the WHERE clause for this query. Please could someone explain why this is so as from the documentation I thought the opposite should be true?
回答1:
There might a few reasons and it depends on which index the query engine decides to use or if there is an index at all.
First thing I can say is that there is likely not much data in this container because queries without a partition key get progressively more expensive the larger the container, especially when they span physical partitions.
The first one could be more expensive if there is no index on the partition key and did a scan on it after filtering by the c.field.
It could also be more expensive depending on whether there is a composite index and whether it used it.
Really though you cannot take query metrics for small containers and extrapolate. The only way to measure is to put enough data into the container. Also the amount here is so small that it's not worth optimizing over. I would put the amount of data into this container you expect to have once in production and re-run your queries.
Lastly, with regards to measuring and optimizing, pareto principle applies. You'll go nuts chasing down every optimization. Find your high concurrency queries and focus on those.
Hope this is helpful.
来源:https://stackoverflow.com/questions/62365691/why-does-including-partition-key-in-where-clause-to-cosmos-sql-api-query-increas