What would be the best way of designing a database to store blog posts and comments? I am currently thinking one table for posts, and another for comments, each with a post ID.<
It seems to me, however, trawling through a large table of comments
All the database vendors agree with you.
They offer "indexes" to limit this.
Every database system you would be using to implement your blog will use indexing. What this means is that, rather than "trawling through a large table", your database system maintains a seperate list of comments and which posts they are associated with, much like the index at the back of a book. This allows the database system to load comments associated with a post extremely quickly, and I don't see any problems with your proposed design for a blog of any size.
Indexes are routinely used to associate tables with millions of rows with other tables with millions of rows - you would have to have an exceptionally large blog to require denormalization of comments, and even still, caching would probably serve you far better than denormalizing the database.
You will need to define an index on your comments table, and associate it with whatever column holds the Post ID. How that's done is dependent on what database system you are using.
Okay, let's see.
trawling through a large table of comments to find those for the relevant post would be expensive
Why do you think it'd be expensive? Because you possibly believe that a linear search will be done every time taking O(n) time. For a billion comments, a billion iterations will be done.
Now suppose a binary search tree is constructed for comment_ID. To look up any comment, you need log(n) time [base 2]. So for even 1 billion comments, only around 32 iterations will be needed.
Now consider a slightly modified BST, where each node contains k elements instead of 1 (in a list) and has k+1 children nodes. The same properties of BST are followed in this data structure as well. What we've got here is called a B-tree. More reading : GeeksForGeeks - B Tree Introduction
For a B-Tree, the lookup time is log(n) [base k]. Hence, if k=10, for 1 billion entries, only 9 iterations will be needed.
All databases save indexes for primary keys in B-Trees. Hence, the stated task would not be expensive, and you should go ahead and design the database the way it seemed obvious.
PS: You can build an index on any column of the table. By default primary key indexes are already stored. But be careful, do not make unnecessary indexes as they take up disk space.
trawling through a large table of comments to find those for the relevant post would be expensive,
An index is always there to rescue you! First index on postId
and another of commentdate
(desc)
try something like this:
Blog
BlogID int auto number PK
BlogName string
...
BlogPost
BlogPostID int auto number PK
BlogID int FK to Blog.BlogID, index
BlogContent string
....
Comment
CommentID int auto number PK
BlogPostID int FK to BlogPost.BlogPostID, index
ReplyToCommentID int FK to Comment.CommentID <<for comments on comments
...