I\'m coding a website in PHP/MySQL and I\'d like to implement a similar to stackoverflow tagging engine. I have 3 relevant tables in DB: 1. Items 2. Tags 3. ItemTagMap (maps tag
You'll want to try to minimize the number of DB calls, putting the heavy work into PHP.
First, select all your Items from the DB:
select * from items where (conditions);
Then, create an array of all id's from the result set.
$ids = array();
foreach ($items as $item) {
$ids[] = $item['id'];
}
$ids = implode(',' $ids);
Then select all ItemTagMaps and associated tag data for the Item ID's you previously retrieved.
select map.item_id, t.id, t.name from tags t, item_tag_maps map where t.id = map.tag_id and map.item_id in ($ids);
Now when you loop through your $items array, you can locate all matching tags from the 2nd SQL query you performed as long as it has a matching item_id value.
Before we go into premature optimization mode, it may be useful to look into the following query template. If nothing else this could be used as a baseline against which the effectiveness of possible optimizations can be measured.
SELECT T.Tagid, TagInfo.TagName, COUNT(*)
FROM Items I
JOIN Tags TagInfo ON TagInfo.TagId = T.TagId
JOIN ItemTagMap T ON I.ItemId = T.ItemId
--JOIN ItemTagMap T1 ON I.ItemId = T1.ItemId
WHERE I.ItemId IN
(
SELECT ItemId
FROM Items
WHERE -- Some typical initial search criteria
Title LIKE 'Bug Report%' -- Or some fulltext filter instead...
AND ItemDate > '02/22/2008'
AND Status = 'C'
)
--AND T1.TagId = 'MySql'
GROUP BY T.TagId, TagInfo.TagName
ORDER BY COUNT(*) DESC
The subquery is the "driving query", i.e. the one corresponding to the end-user's initial criteria. (see below for details on how this query, required multiple times may fit in an overall optimized flow) Commented is the JOIN on T1 (and possibly T2, T3, when several tags are selected), and, with the WHERE clause, the associated criteria. These are needed when the user selects a particular tag, whether as part of the initial search or by refinement. (It may be more efficient to place these joins and where clauses within the sub-query; more on these below)
Discussion... The "driving query", or a variation thereof is needed for two distinct purposes:
Note that the complete list doesn't need to be sorted (or it may benefit from sorting in a different order), whereby the second list needs to be sorted based on the user's choice (say by Date, descending or by Title, alphabetically ascending). Also note that if there is any sort order required, the cost of the query will imply dealing with the complete list (shy of odd optimization by SQL itself, and/or some denormalization, SQL needs to "see" the last records on that list, in case they belong to the top, sort-wise).
This latter fact, is in favor of having the very same query for both purposes, the corresponding list can be stored in a temporary table. The general flow would be to quickly lookup the top N Item records with their details and returns this to the application at once. The application can then obtain ajax-fashion the list of Tags for refinements. This list would be produce with a query akin the one above, where the subquery is replaced by a "select * from temporaryTable." The odds are good that the SQL optimizer will decide to sort this list (in some cases), let's let it do that, rather than second guessing it and sorting it explicitly.
One other point to consider is to maybe bring the join(s) on ItemTagMap table inside the "driving query" rather that as shown above. It is probably best to do so, both for performance, and because it will produce the right list for the #2 purpose (display of a page of items).
The query/flow described above will likely scale rather well, even on relatively modest hardware; tentatively into the 1/2 Million+ Items, with sustained user searches maybe up to 10 per second. One of the key factor would be the selectivity of the initial search criteria.
Optimization ideas
-- 'nough said! --
Appropriate architecture and optimizations should be selected in light of the actual requirements and of the effective data statistical profile...
Assuming:
then:
SELECT t.name
FROM Tag t
WHERE EXISTS (SELECT 1 FROM ItemTag WHERE item_id = 1234)
ORDER BY t.name
Nothing intensive about that. This is similar but my guess is it would be slower:
SELECT t.name
FROM Tag t
WHERE t.id IN (SELECT tag_id FROM ItemTag WHERE item_id = 1234)
ORDER BY t.name
This can be done as a join as well:
SELECT DISTINCT t.name
FROM Tag t
JOIN ItemTag i WHERE i.tag_id = t.id
WHERE i.item_id = 1234
ORDER BY t.name
I think the first one will be faster but as is always the case with SQL, it's worth testing (on a sufficiently sized data set).
The above have been done to list the tags for a single item. You want a composite set of tags for search results. That's not difficult from the above but it depends on how you get your search results.