There is a big database, 1,000,000,000 rows, called threads (these threads actually exist, I\'m not making things harder just because of I enjoy it). Threads has only a few
EDIT: Your one-column indices are not enough. You would need to, at least, cover the three involved columns.
More advanced solution: replace replycount > 1
with hasreplies = 1
by creating a new hasreplies
field that equals 1 when replycount > 1
. Once this is done, create an index on the three columns, in that order: INDEX(forumid, hasreplies, dateline)
. Make sure it's a BTREE index to support ordering.
You're selecting based on:
forumid
hasreplies
dateline
Once you do this, your query execution will involve:
forumid = X
. This is a logarithmic operation (duration : log(number of forums)). hasreplies = 1
(while still matching forumid = X
). This is a constant-time operation, because hasreplies
is only 0 or 1. My earlier suggestion to index on replycount
was incorrect, because it would have been a range query and thus prevented the use of a dateline
to sort the results (so you would have selected the threads with replies very fast, but the resulting million-line list would have had to be sorted completely before looking for the 100 elements you needed).
IMPORTANT: while this improves performance in all cases, your huge OFFSET value (10000!) is going to decrease performance, because MySQL does not seem to be able to skip ahead despite reading straight through a BTREE. So, the larger your OFFSET is, the slower the request will become.
I'm afraid the OFFSET problem is not automagically solved by spreading the computation over several computations (how do you skip an offset in parallel, anyway?) or moving to NoSQL. All solutions (including NoSQL ones) will boil down to simulating OFFSET based on dateline
(basically saying dateline > Y LIMIT 100
instead of LIMIT Z, 100
where Y
is the date of the item at offset Z
). This works, and eliminates any performance issues related to the offset, but prevents going directly to page 100 out of 200.