I know it\'s generally a bad idea to do queries like this:
SELECT * FROM `group_relations`
But when I just want the count, should I go for this
As you've seen, when tables grow large, COUNT
queries get slow. I think the most important thing is to consider the nature of the problem you're trying to solve. For example, many developers use COUNT
queries when generating pagination for large sets of records in order to determine the total number of pages in the result set.
Knowing that COUNT
queries will grow slow, you could consider an alternative way to display pagination controls that simply allows you to side-step the slow query. Google's pagination is an excellent example.
If you absolutely must know the number of records matching a specific count, consider the classic technique of data denormalization. Instead of counting the number of rows at lookup time, consider incrementing a counter on record insertion, and decrementing that counter on record deletion.
If you decide to do this, consider using idempotent, transactional operations to keep those denormalized values in synch.
BEGIN TRANSACTION;
INSERT INTO `group_relations` (`group_id`) VALUES (1);
UPDATE `group_relations_count` SET `count` = `count` + 1;
COMMIT;
Alternatively, you could use database triggers if your RDBMS supports them.
Depending on your architecture, it might make sense to use a caching layer like memcached to store, increment and decrement the denormalized value, and simply fall through to the slow COUNT query when the cache key is missing. This can reduce overall write-contention if you have very volatile data, though in cases like this, you'll want to consider solutions to the dog-pile effect.