How to store an ordered set of documents in MongoDB without using a capped collection

前端 未结 4 1603
囚心锁ツ
囚心锁ツ 2020-12-28 23:11

What\'s a good way to store a set of documents in MongoDB where order is important? I need to easily insert documents at an arbitrary position and possibly reorder them lat

相关标签:
4条回答
  • 2020-12-28 23:26

    Here is a link to some general sorting database answers that may be relevant:

    https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database/369754

    I suggest going with Floating point solution - adding a position column:

    Use a floating-point number for the position column. You can then reorder the list changing only the position column in the "moved" row. If your user wants to position "red" after "blue" but before "yellow" Then you just need to calculate

    red.position = ((yellow.position - blue.position) / 2) + blue.position

    After a few million re-positions you may get floating-point numbers so small that there is no "between" -- but this is about as likely as sighting a unicorn.

    When retrieving it you can simply say col.sort() to get it sorted and no need for any client-side code (Like in the case of a Linked list solution)

    0 讨论(0)
  • 2020-12-28 23:29

    Based on your requirement, one of the approaches could be to design your schema, in such a way that each document has the capability to hold more than one document and in itself act as a capped container.

    {
      "_id":Number,
      "doc":Array
    }
    

    Each document in the collection will act as a capped container, and the documents will be stored as array in the doc field. The doc field being an array, will maintain the order of insertion. You can limit the number of documents to n. So the _id field of each container document will be incremental by n, indicating the number of documents a container document can hold.

    By doing these you avoid adding extra fields to the document, extra indices, unnecessary sorts.

    Inserting the very first record

    i.e when the collection is empty.

    var record = {"name" : "first"};
    db.col.insert({"_id":0,"doc":[record]});
    

    Inserting subsequent records

    • Identify the last container document's _id, and the number of documents it holds.
    • If the number of documents it holds is less than n, then update the container document with the new document, else create a new container document.

    Say, that each container document can hold 5 documents at most,and we want to insert a new document.

    var record = {"name" : "newlyAdded"};
    
    // using aggregation, get the _id of the last inserted container, and the 
    // number of record it currently holds.
    db.col.aggregate( [ {
        $group : {
            "_id" : null,
            "max" : {
                $max : "$_id"
            },
            "lastDocSize" : {
                $last : "$doc"
            }
        }
    }, {
        $project : {
            "currentMaxId" : "$max",
            "capSize" : {
                $size : "$lastDocSize"
            },
            "_id" : 0
        }
    // once obtained, check if you need to update the last container or 
    // create a new container and insert the document in it.
    } ]).forEach( function(check) {
        if (check.capSize < 5) {
            print("updating");
            // UPDATE
            db.col.update( {
                "_id" : check.currentMaxId
            }, {
                $push : {
                    "doc" : record
                }
            });
        } else {
            print("inserting");
            //insert
            db.col.insert( {
                "_id" : check.currentMaxId + 5,
                "doc" : [ record ]
            });
        }
    })
    

    Note that the aggregation, runs on the server side and is very efficient, also note that the aggregation would return you a document rather than a cursor in versions previous to 2.6. So you would need to modify the above code to just select from a single document rather than iterating a cursor.

    Inserting a new document in between documents

    Now, if you would like to insert a new document between documents 1 and 2, we know that the document should fall inside the container with _id=0 and should be placed in the second position in the doc array of that container.

    so, we make use of the $each and $position operators for inserting into specific positions.

    var record = {"name" : "insertInMiddle"};
    
    db.col.update(
    {
        "_id" : 0
    }, {
        $push : {
            "doc" : {
                $each : [record],
                $position : 1
            }
        }
    }
    );
    

    Handling Over Flow

    Now, we need to take care of documents overflowing in each container, say we insert a new document in between, in container with _id=0. If the container already has 5 documents, we need to move the last document to the next container and do so till all the containers hold documents within their capacity, if required at last we need to create a container to hold the overflowing documents.

    This complex operation should be done on the server side. To handle this, we can create a script such as the one below and register it with mongodb.

    db.system.js.save( {
        "_id" : "handleOverFlow",
        "value" : function handleOverFlow(id) {
            var currDocArr = db.col.find( {
                "_id" : id
            })[0].doc;
            print(currDocArr);
            var count = currDocArr.length;
            var nextColId = id + 5;
            // check if the collection size has exceeded
        if (count <= 5)
            return;
        else {
            // need to take the last doc and push it to the next capped 
        // container's array
        print("updating collection: " + id);
        var record = currDocArr.splice(currDocArr.length - 1, 1);
        // update the next collection
        db.col.update( {
            "_id" : nextColId
        }, {
            $push : {
                "doc" : {
                    $each : record,
                    $position : 0
                }
            }
        });
        // remove from original collection
        db.col.update( {
            "_id" : id
        }, {
            "doc" : currDocArr
        });
        // check overflow for the subsequent containers, recursively.
        handleOverFlow(nextColId);
    }
    }
    

    So that after every insertion in between , we can invoke this function by passing the container id, handleOverFlow(containerId).

    Fetching all the records in order

    Just use the $unwind operator in the aggregate pipeline.

    db.col.aggregate([{$unwind:"$doc"},{$project:{"_id":0,"doc":1}}]);
    

    Re-Ordering Documents

    You can store each document in a capped container with an "_id" field:

    .."doc":[{"_id":0,","name":"xyz",...}..]..
    

    Get hold of the "doc" array of the capped container of which you want to reorder items.

    var docArray = db.col.find({"_id":0})[0];
    

    Update their ids so that after sorting the order of the item will change.

    Sort the array based on their _ids.

    docArray.sort( function(a, b) {
        return a._id - b._id;
    });
    

    update the capped container back, with the new doc array.

    But then again, everything boils down to which approach is feasible and suits your requirement best.

    Coming to your questions:

    What's a good way to store a set of documents in MongoDB where order is important?I need to easily insert documents at an arbitrary position and possibly reorder them later.

    Documents as Arrays.

    Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?

    use the $each and $position operators in the db.collection.update() function as depicted in my answer.

    My limited understanding of Database Administration tells me that a query like that would be slow and generally a bad idea, but I'm happy to be corrected.

    Yes. It would impact the performance, unless the collection has very less data.

    I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet again, I might be wrong about that one too.)

    Yes. With Capped Collections, you may lose data.

    0 讨论(0)
  • 2020-12-28 23:34

    For abitrary sorting of any collection, you'll need a field to sort it on. I call mine "sequence".

    schema:
    {
     _id: ObjectID,
     sequence: Number,
     ...
    }
    
    db.items.ensureIndex({sequence:1});
    
    db.items.find().sort({sequence:1})
    
    0 讨论(0)
  • 2020-12-28 23:39

    An _id field in MongoDB is a unique, indexed key similar to a primary key in relational databases. If there is an inherent order in your documents, ideally you should be able to associate a unique key to each document, with the key value reflecting the order. So while preparing your document for insertion, explicitly add an _id field as this key (if you do not, mongo creates it automatically with a BSON objectid).

    As far as retrieving the results are concerned, MongoDB does not guarantee the order of return documents unless you explicitly use .sort() . If you do not use .sort(), the results are usually returned in natural order (order of insertion).Again, there is no guarantee on this behavior.

    I'd advise you to override _id with your order while inserting, and use a sort while retrieving. Since _id is a necessary and auto-indexed entity, you will not be wasting any space defining a sort key, and storing the index for it.

    0 讨论(0)
提交回复
热议问题