问题
We are using Azure Data Factory
for ETL to push materialized views to our Cosmos DB instance, making sure our production Azure CosmosDB (SQL API)
is containing all necessary for our users.
The CosmosDB is under constant load as data flows in via speed layer as well. This is expected and currently solved with an autoscaling RU setting.
We have tried these options:
- An ADF pipeline with
Copy activity
toupsert
data fromAzure Datalake (Gen2)
(source) to the collection inCosmos DB
(sink). - An ADF pipeline using DataFlow with a CosmosDB sink and the
Write throughput budget
set to an acceptable level, withAllow upsert
. Using the same source as previous.
Non the less we see a lot of 429's, our CosmosDB instance overwhelmed and our users experience affected with poor experience, timeouts and slow response times.
Since the copy activity tries to upsert all data as fast and efficient as possible, it consumes all available RUs in a greedy way. Resulting in a lot of 429, our CosmosDB instance overwhelmed, speed layer affected, and our users experience affected with poor experience, timeouts and slow response times.
We hoped the second option with setting the throughput budget would solve the issue, but it has not. Are we doing something wrong?
Does anyone have any suggestions on how to solve this? Please advise
Edit: Clarifications
来源:https://stackoverflow.com/questions/61954653/azure-data-factory-copy-activity-data-flow-consumes-all-rus-in-cosmosdb