Solution to Bulk FindAndModify in MongoDB

前端 未结 1 1311
借酒劲吻你
借酒劲吻你 2021-02-07 10:36

My use case is as follows - I have a collection of documents in mongoDB which I have to send for analysis. The format of the documents are as follows -

{ _id:ObjectId(\"

相关标签:
1条回答
  • 2021-02-07 11:06

    As you mention there is currently no clean way to do what you want. The best approach at this time for operations like the one you need is this :

    1. Reader selects X documents with appropriate limit and sorting
    2. Reader marks the documents returned by 1) with it's own unique reader ID (e.g. update({_id:{$in:[<result set ids>]}, state:"available", $isolated:1}, {$set:{readerId:<your reader's ID>, state:"processing"}}, false, true))
    3. Reader selects all documents marked as processing and with it's own reader ID. At this point it is guaranteed that you have exclusive access to the resulting set of documents.
    4. Offer the resultset from 3) for your processing.

    Note that this even works in highly concurrent situations as a reader can never reserve documents not already reserved by another reader (note that step 2 can only reserve currently available documents, and writes are atomic). I would add a timestamp with reservation time as well if you want to be able to time out reservations (for example for scenarios where readers might crash/fail).

    EDIT: More details :

    All write operations can occasionally yield for pending operations if the write takes a relatively long time. This means that step 2) might not see all documents marked by step 1) unless you take the following steps :

    • Use an appropriate "w" (write concern) value, meaning 1 or higher. This will ensure that the connection on which the write operation is invoked will wait for it to complete regardless of it yielding.
    • Make sure you do the read in step 2 on the same connection (only relevant for replicasets with slaveOk enabled reads) or thread so that they are guaranteed to be sequential. The former can be done in most drivers with the "requestStart" and "requestDone" methods or similar (Java documentation here).
      • Add the $isolated flag to your multi-updates to ensure it cannot be interleaved with other write operations.

    Also see comments for discussion regarding atomicity/isolation. I incorrectly assumed multi-updates were isolated. They are not, or at least not by default.

    0 讨论(0)
提交回复
热议问题