问题
I have seen like a huge amount of data write to cosmos DB from stream analytics job on a particular day. It was not supposed to write huge amount of documents in a day. I have to check if there is duplication of documents on that particular day.
Is there any query/any way to find out duplicate records in cosmos DB?
回答1:
Is there any query/any way to find out duplicate records in cosmos DB?
Quick answer is YES.Please use distinct keyword in the cosmos db query sql.And filter the _ts
(System generated unix timestamp:https://docs.microsoft.com/en-us/azure/cosmos-db/databases-containers-items#properties-of-an-item)
Something like:
Select distinct c.X,c.Y,C.Z....(all columns you want to check) from c where c._ts = particular day
Then you could delete the duplicate data using this bulk delete lib:https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/tree/master/BulkDeleteSample.
来源:https://stackoverflow.com/questions/59213815/how-to-find-duplicate-documents-in-cosmos-db