How to Ignore Duplicate Key Errors Safely Using insert_many

前端 未结 2 2012
忘掉有多难
忘掉有多难 2020-11-27 20:24

I need to ignore duplicate inserts when using insert_many with pymongo, where the duplicates are based on the index. I\'ve seen this question asked on stackoverflow, but I h

相关标签:
2条回答
  • 2020-11-27 21:09

    You can deal with this by inspecting the errors produced with BulkWriteError. This is actually an "object" which has several properties. The interesting parts are in details:

    import pymongo
    from bson.json_util import dumps
    from pymongo import MongoClient
    client = MongoClient()
    db = client.test
    
    collection = db.duptest
    
    docs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]
    
    
    try:
      result = collection.insert_many(docs,ordered=False)
    
    except pymongo.errors.BulkWriteError as e:
      print e.details['writeErrors']
    

    On a first run, this will give the list of errors under e.details['writeErrors']:

    [
      { 
        'index': 1,
        'code': 11000, 
        'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }', 
        'op': {'_id': 1}
      }
    ]
    

    On a second run, you see three errors because all items existed:

    [
      {
        "index": 0,
        "code": 11000,
        "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", 
        "op": {"_id": 1}
       }, 
       {
         "index": 1,
         "code": 11000,
         "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
         "op": {"_id": 1}
       },
       {
         "index": 2,
         "code": 11000,
         "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",
         "op": {"_id": 2}
       }
    ]
    

    So all you need do is filter the array for entries with "code": 11000 and then only "panic" when something else is in there

    panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])
    
    if len(panic) > 0:
      print "really panic"
    

    That gives you a mechanism for ignoring the duplicate key errors but of course paying attention to something that is actually a problem.

    0 讨论(0)
  • 2020-11-27 21:23

    Adding more to Neil's solution.

    Having 'ordered=False, bypass_document_validation=True' params allows new pending insertion to occur even on duplicate exception.

    from pymongo import MongoClient, errors
    
    DB_CLIENT = MongoClient()
    MY_DB = DB_CLIENT['my_db']
    TEST_COLL = MY_DB.dup_test_coll
    
    doc_list = [
        {
            "_id": "82aced0eeab2467c93d04a9f72bf91e1",
            "name": "shakeel"
        },
        {
            "_id": "82aced0eeab2467c93d04a9f72bf91e1",  # duplicate error: 11000
            "name": "shakeel"
        },
        {
            "_id": "fab9816677774ca6ab6d86fc7b40dc62",  # this new doc gets inserted
            "name": "abc"
        }
    ]
    
    try:
        # inserts new documents even on error
        TEST_COLL.insert_many(doc_list, ordered=False, bypass_document_validation=True)
    except errors.BulkWriteError as e:
        print(f"Articles bulk insertion error {e}")
    
        panic_list = list(filter(lambda x: x['code'] != 11000, e.details['writeErrors']))
        if len(panic_list) > 0:
            print(f"these are not duplicate errors {panic_list}")
    

    And since we are talking about duplicates its worth checking this solution as well.

    0 讨论(0)
提交回复
热议问题