How to find Duplicate documents in Cosmos DB

元气小坏坏 提交于 2020-01-25 04:19:12

问题


I have seen like a huge amount of data write to cosmos DB from stream analytics job on a particular day. It was not supposed to write huge amount of documents in a day. I have to check if there is duplication of documents on that particular day.

Is there any query/any way to find out duplicate records in cosmos DB?


回答1:


Is there any query/any way to find out duplicate records in cosmos DB?

Quick answer is YES.Please use distinct keyword in the cosmos db query sql.And filter the _ts(System generated unix timestamp:https://docs.microsoft.com/en-us/azure/cosmos-db/databases-containers-items#properties-of-an-item)

Something like:

Select distinct c.X,c.Y,C.Z....(all columns you want to check) from c where c._ts = particular day

Then you could delete the duplicate data using this bulk delete lib:https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/tree/master/BulkDeleteSample.



来源:https://stackoverflow.com/questions/59213815/how-to-find-duplicate-documents-in-cosmos-db

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!