I\'ve seen many posts on how to do many-to-many relationships with MongoDB, but none of them mention scale. For example these posts:
MongoDB Many-to-Many Association
This is a good question which illustrates the problems with overemebedding and how to deal with it.
Let's stick with the example of users liking posts, which is a simple example. The other relations would have to be handled accordingly.
You are absolutely right that with storing the likes inside the post would sooner or later lead to the problem that very popular posts would reach the size limit.
So you correctly fell back to create a post_likes
collection. Why do I call this correct? Since it fits your use cases and functional and non-functional requirements!
post_id
and liked_user_id
) and use (both the user and the post are known, so adding a like is a simple insert or more likely an upsert)However I would expand the collection a bit to prevent unneeded queries for certain use cases which are frequent.
Let's assume for now that post titles and usernames can't be changed. In that case, the following data model could make more sense
{
_id: new ObjectId(),
"post_id": someValue,
"post_title": "Cool thing",
"liked_user_id": someUserId,
"user_name": "JoeCool"
}
Now let's assume you want to display the username of all users that liked a post. With the model above, that would be a single, rather fast query:
db.post_likes.find(
{"postId":someValue},
{_id:0,user_name:1}
)
With only the IDs stored, this rather usual task would need at least two queries and - given the constraint that there can be an infinite number of likers for a post - potentially huge memory consumption (you'd need to store the user IDs in RAM).
Granted, this leads to some redundancy, but even when millions of people like a post, we are talking only of a few megabytes of relatively cheap (and easy to scale) disk space while gaining a lot of performance in terms of user experience.
Now here comes the thing: Even if the user names and post titles are subject to change, you only had to do a multi update:
db.post_likes.update(
{"post_id":someId},
{ $set:{ "post_title":newTitle} },
{ multi: true}
)
You are trading that it takes a while to do some rather rare stuff like changing a username or a post for extreme speed for use cases which happen extremely often.
Keep in mind that MongoDB is a document oriented database. So document the events you are interested in with the values you need for future queries and model your data accordingly.
If you're just storing the ID's of the relationships inside the arrays of reach collection you shouldn't have much of a problem within a single document. GridFS can be used but that's usually more for media like files, music, videos, etc. using GridFS would make doing updates a pain