How to remove duplicates based on a key in Mongodb?

后端 未结 8 755
伪装坚强ぢ
伪装坚强ぢ 2020-11-30 20:56

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,

 { \"_id\" = ObjectId(\"50731xxxxxxxxxxxxxxxxxx         


        
8条回答
  •  有刺的猬
    2020-11-30 21:31

    This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.

    If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:

    db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
    

    This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.

    Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.

    Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.

提交回复
热议问题