Many to many relationships with MongoDB at large scale

前端 未结 2 1142
臣服心动
臣服心动 2021-02-03 12:11

I\'ve seen many posts on how to do many-to-many relationships with MongoDB, but none of them mention scale. For example these posts:

MongoDB Many-to-Many Association

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-03 12:57

    This is a good question which illustrates the problems with overemebedding and how to deal with it.

    Example: Post likes

    Let's stick with the example of users liking posts, which is a simple example. The other relations would have to be handled accordingly.

    You are absolutely right that with storing the likes inside the post would sooner or later lead to the problem that very popular posts would reach the size limit.

    So you correctly fell back to create a post_likes collection. Why do I call this correct? Since it fits your use cases and functional and non-functional requirements!

    • It scales indefinetly (well, there is a theoretical limit, but it is humongous)
    • It is easy to maintain (create a unique index over post_id and liked_user_id) and use (both the user and the post are known, so adding a like is a simple insert or more likely an upsert)
    • You are able to easily find out which users like which post and which post is liked by which users

    However I would expand the collection a bit to prevent unneeded queries for certain use cases which are frequent.

    Let's assume for now that post titles and usernames can't be changed. In that case, the following data model could make more sense

    {
      _id: new ObjectId(),
      "post_id": someValue,
      "post_title": "Cool thing",
      "liked_user_id": someUserId,
      "user_name": "JoeCool"
    }
    

    Now let's assume you want to display the username of all users that liked a post. With the model above, that would be a single, rather fast query:

    db.post_likes.find(
      {"postId":someValue},
      {_id:0,user_name:1}
    )
    

    With only the IDs stored, this rather usual task would need at least two queries and - given the constraint that there can be an infinite number of likers for a post - potentially huge memory consumption (you'd need to store the user IDs in RAM).

    Granted, this leads to some redundancy, but even when millions of people like a post, we are talking only of a few megabytes of relatively cheap (and easy to scale) disk space while gaining a lot of performance in terms of user experience.

    Now here comes the thing: Even if the user names and post titles are subject to change, you only had to do a multi update:

    db.post_likes.update(
      {"post_id":someId},
      { $set:{ "post_title":newTitle} },
      { multi: true}
    )
    

    You are trading that it takes a while to do some rather rare stuff like changing a username or a post for extreme speed for use cases which happen extremely often.

    Bottom line

    Keep in mind that MongoDB is a document oriented database. So document the events you are interested in with the values you need for future queries and model your data accordingly.

提交回复
热议问题