I\'m pretty new to caching strategies and implementations. I\'m working on a project that will be database intensive, but also have information being updated and changed very re
Use MySQL's built in query cache instead of trying to maintain it yourself. It will automatically clear cached queries to tables when they are written to. Plus, it works in memory so it should be very efficient...
Also, don't just cache queries. Try to cache entire segments of the application at different stages in the rendering cycle. So you can let MySQL cache the queries, then you cache each individual view (rendered), each individual block, and each page. Then, you can choose whether or not to pull from cache based upon the request.
For example, a non-logged-in user may get the full page directly from cache. But a logged-in user may not be able to (due to username, etc). So for him, you may be able to render 1/2 your views on the page from cache (since they don't depend on the user object). You still get the benefit of caching, but it'll be tiered based upon need.
If you're really expecting a lot of traffic, it's definitely worth looking into Memcached. Let MySQL store your queries for you, and then store all user-land cache items in memcache...
Edit: To answer your edit:
Filesystems can become slow if a single directory grows big. As long as you're "namespacing" by directory (so each directory only has a small portion of cache files), you should be fine from that standpoint. As for the exact threshold, it really will depend on your hardware and filesystem more than anything else. I know EXT3 gets quite slow if there are a load of files in a single directory (I have directories with literally hundreds of thousands of files, and it can take up to half a second to simply stat()
one of the files, let alone do any kind of directory listing)...
But realize that if you add another server, you're going to either have duplication of cache (which is not a good thing), or are going to have to rewrite your entire cache layer. Is there a reason not to go with Memcached
right from the start?
EDit 2: To answer your latest edit:
It's still too tough to call. I have an application that has a database with around 1.5 billion rows (growing at around 500k per day). We don't use any caching on it at all because we don't have concurrency issues. And even if we did, we'd be better off throwing more MySQL servers at it rather than adding caching since any form of cache would have such a low hit rate that it wouldn't be worth the development time to add it.
And that's the reason I am so adamant about not caching for speed. There will always be an object that is not in cache. So if you hit a page with one of those objects, it still needs to be fast. As a rule of thumb, I try to cache anything that will be accessed again in the next few minutes (I keep a time to live of about 5 minutes in production on other applications anyway). So if items aren't getting more than a few hits in that time span, or the hit rate is very low (less than 90%), I don't bother caching that item....
This is both hardware- and application-dependent. You need to perform benchmarks to determine the threshold at which OS indexing becomes greater than the data storage/retrieval duration (both at the MySQL level and cached file access level). And you also need to compare that against the acceptable (very subjective) threshold of your audience.
The general rule is: do not cache while it is not necessary and cache only the things that need to be cached.