Ive recently implemented memcache on my site which has been under heavy mysql load (mysql was as optimized as I could make it). It solved all my load issues, and site is running
Couple simple things you can do:
First, if you really want to use the query string as a cache key, make it more deterministic and predictable. I'd do this by sorting the query string, e.g, : ?zed=7&alpha=1
is transformed to ?alpha=1&zed=7
. Also strip out variables that aren't relevant to the caching key.
To handle the problem of the ?page parameter, and items not showing up because the cache hasn't refreshed, I've got a couple ideas:
Folke's idea of adding a 'version' to the cache key would work well. The same trick is used to easily make links like unvisited.
Another approach would be to store the number of pages in the cache value, and then, when the database is updated, iterate through the cache keys.
cache.put("keyword,page=3", array(num_pages=7, value=...))
...later...
update_entry()
num_pages, value = cache.get("keyword,page=3")
for i in num_pages:
cache.flush("keyword,page="+i)
Whether this is a good idea or not depends on how many pages there are, and the chance of updates coming in while the loop is running.
A third idea is to cache the entire result set instead of just that page of results. This may or may not be an option depending up on the size of the result set. When that result set is updated, you just flush the cache for that keyword.
cache.put("keyword", array(0="bla", 1=foo", ...)
...later...
cache.get("keyword")[page_num]
A fourth idea is to change your caching backend and use something built to handle this situation. I dunno what other cache servers are out there, so you'll have to look around.
Finally, to supplement all this, you can try and be smarter about the expire time on cache entries. e.g., use the mean time between updates, or the number of queries per second for the keyword, etc.
cache invalidation is a big problem
"There are only two hard problems in Computer Science: cache invalidation and naming things."
I will give you a few ideas that will lead you to full solution as there is no genral solution for all use case..
You may benefit from a simpler naming scheme for your memcached keys - so they are easier to delete. Seems like with the MD5 solution, you might be creating too many keys for things which generally show the same data.
You might also consider a shorter cache time, like 20 minutes?
Also - how many items per page are you retrieving for each of these search result pages? If you have a paginated search - getting 50 items from the server shouldn't be too intensive.
You may have tuned the mysql server, but have you tuned the queries (improving them by examining the EXPLAIN output), or table structures (by adding useful indexes)?
I'm also wondering how intense the queries on those pages are. Do you join several tables? You may benefit from doing a simpler query - or a few queries (outlined below).
Alternatively - For each row in the result, do you run another query - or several? You may benefit from a slightly more complex search query that avoids you having to do the nested queries. Or, are you being bitten by an ORM library which does the same thing, runs a search, then queries for sub items on each iteration?
The 'a few simpler queries' solution - say for example - if you've got an item, and want to know it's category in the result set...
In stead of this:
SELECT i.id, i.name,
c.category FROM items AS i
INNER JOIN categories AS c
ON i.category_id = c.id;
This is a simple example - but say there were categories, and several other JOINs involved.
You might go this route:
// run this query
SELECT id, category FROM categories - and put that into a keyed array.
// then in PHP create an array keyed by the id
$categories = array();
while ( false !== ( $row = mysql_fetch_assoc ( $result ) ) )
{
$categories[ $row['id'] ] = $row['category'];
}
// and so on
$types = array(); // ...
// etc.
Then do your search but without all of the JOINS, just from the items table with your where clauses, and in the output say...
<?php foreach($items as $item): ?>
<h4><?php echo $item['name']; ?></h4>
<p>Category: <?php echo $categories[ $item['category_id'] ]; ?></p>
<p>Type: <?php echo $types[ $item['type_id'] ]; ?></p>
<!-- and so on -->
<?php endforeach; ?>
It's a little ghetto, but maybe this - and the other suggestions - will help.
Memcached::set has an expire parameter. Perhaps you can let this default to an hour, but for the pages that return search results - or in your forum, you can set this to a shorter period of time.
What you could do to make sure that your cache is always up to date without doing lots of changes to your code is work with a "version cache". This does increase the number of memcache requests you will make, but this might be a solution for you.
Another good thing about this solution is that you can set expiration time to never expire.
The idea is to basically have a version number stored in memcache for in your case a certain keyword (per keywork, not combination). How to use this?
When someone submits a new item:
if(!Memcache:increment("version_" + keyword)) {Memcache:set("version_" + keyword);}
When someone executes a query:
This ensures that as soon as a keyword has new results (or less when deleting), the version will be bumped and as such all related memcache queries.
Cache always up to date and queries can potentially stay longer than 1 hour in the cache.
Since you are caching entire pages in memcached, your pages can't share cached data from the database with each other. Say I have page1.php and page2.php, with page1 and page2 as keys in memcached. Both pages display items. I add a new item. Now I have to expire page1 and page2.
Instead, I could have an items key in memcached, that page1.php and page2.php both use to display items. When I add a new item, I expire the items key (or better, update it's value), and both page1.php and page2.php are up-to-date.
If you still want to cache the entire page, you could add information to your keys that will change when data being cached changes (this wouldn't make sense if the data changes too often). For instance:
"page1:[timestamp of newest item]"
This way you can look up the timestamp of the newest item, an inexpensive query, and build your cache key with it. Once a newer item is added, the cache key will change, automatically expiring. This method means you still have to hit the database to see what the newest item's timestamp is, every time.