Why does including partition key in WHERE clause to Cosmos SQL API query increase consumed RUs for some queries?

守給你的承諾、 提交于 2021-01-28 09:51:03

问题


I would like to optimise my Azure Cosmos DB SQL API queries for consumed RUs (in part in order to reduce the frequency of 429 responses).

Specifically I thought that including the partition key in WHERE clauses would decrease consumed RUs (e.g. I read https://docs.microsoft.com/en-us/azure/cosmos-db/optimize-cost-queries and https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview which made me think this).

However, when I run

SELECT TOP 1 * 
FROM c
WHERE c.Field = "some value"
AND c.PartitionKeyField = "1234"
ORDER BY c.TimeStampField DESC

It consumes 6 RUs.

Whereas without the partition key, e.g.

SELECT TOP 1 * 
FROM c
WHERE c.Field = "some value"
ORDER BY c.TimeStampField DESC

It consumes 5.76 RUs - i.e. cheaper.

(whilst there is some variation in the above numbers depending on the exact document selected, the second query is always cheaper, and I have tested against both the smallest and largest partitions.)

My database currently has around 400,000 documents and 29 partitions (both are expected to grow). Largest partition has around 150,000 documents (unlikely to grow further than this).

The above results indicate to me that I should not pass the partition key in the WHERE clause for this query. Please could someone explain why this is so as from the documentation I thought the opposite should be true?


回答1:


There might a few reasons and it depends on which index the query engine decides to use or if there is an index at all.

First thing I can say is that there is likely not much data in this container because queries without a partition key get progressively more expensive the larger the container, especially when they span physical partitions.

The first one could be more expensive if there is no index on the partition key and did a scan on it after filtering by the c.field.

It could also be more expensive depending on whether there is a composite index and whether it used it.

Really though you cannot take query metrics for small containers and extrapolate. The only way to measure is to put enough data into the container. Also the amount here is so small that it's not worth optimizing over. I would put the amount of data into this container you expect to have once in production and re-run your queries.

Lastly, with regards to measuring and optimizing, pareto principle applies. You'll go nuts chasing down every optimization. Find your high concurrency queries and focus on those.

Hope this is helpful.



来源:https://stackoverflow.com/questions/62365691/why-does-including-partition-key-in-where-clause-to-cosmos-sql-api-query-increas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!