I\'m working on some posting forum projects and trying to figure out the ideal Firestore database structure. I read that documents have a max size of 1 mg but what are the p
You can likely store many posts in a single document, and depending on your application, there may be good reasons for doing so. Just keep a few things in mind:
My guidelines when modeling data in any NoSQL database is:
model application screens in your database
I tend to model the data in my database after the screens that I have in my application. So if you typically show a list of headlines of recent articles when a user starts the app, I might actually create a document that contains just the headlines of recent articles. That way the app only has to read a single document with just the headlines, instead of having to read each individual post. This reduces not only the number of documents the app needs to read, but also the bandwidth it consumes.
don't be afraid to duplicate data
This goes hand-in-hand with the previous guideline, and is very normal across all NoSQL databases, but goes against the core of what many of us have learned from relational databases. It is sometimes also referred to as denormalizing, as it counters the database normalization of relations database models.
Continuing the example from before: you'll probably have a separate document for each post, just to make sure that each post has its own single point of definition. But you'll store parts of that post in many other places, such as in the document-of-recent-headlines that we had before. This means that we'll have to duplicate the data for each new post into that document, and possibly multiple other places. This process is known as fan-out, and there are some common strategies for updating this denormalized data.
I find that this duplication leads to no concerns, as long as it is clear what the main point of definition for each entity is. So in our example: if there ever is a difference between the headline of a post in the post-document itself, and the document-of-recent-headlines, I know that I should update the document-of-recent-headlines, since the post-document itself is my point-of-definition for the post.
The result of all this is that I often see my database as part actual data storage, part prerendered fragments of application screens. As long as the points of definition are clear, that works quite well and allows me to define data models that scale efficiently both for users of the applications that consume the data and for the cost to operate them.
To learn more about NoSQL data modeling: