Suppose I have a \"tags\" table with two columns: tagid and contentid. Each row represents a tag assigned to a piece of content. I want a
SELECT contentID
FROM tags
WHERE tagID in (334, 338, 342)
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = 3
--In general
SELECT contentID
FROM tags
WHERE tagID in (...) --taglist
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = ... --tagcount
I don't know if this is better but it might be more maintainable
select contentid from tags where tagid = 334
intersect
select contentid from tags where tagid = 338
intersect
select contentid from tags where tagid = 342
You'd have to build it dynamically which wouldn't be as bad as your original solution.
What type of SQL? MS SQL Server, Oracle, MySQL?
In SQL Server doesn't this equate to:
select contentid from tags where tagid IN (334,338,342)
The only alternative way i can think of is:
select a.contentid from tags a
inner join tags b on a.contentid = b.contentid and b.tagid=334
inner join tags c on a.contentid = c.contentid and c.tagid=342
where a.tagid=338
Here's a solution that has worked much faster than the for me on a very large database of objects and tags. This is an example for a three-tag intersection. It just chains many joins on the object-tag table (objtags
) to indicate the same object and stipulates the tag IDs in the WHERE
clause:
SELECT w0.objid
FROM objtags t0
INNER JOIN objtags t1 ON t1.objid=t0.objid
INNER JOIN objtags t2 ON t2.objid=t1.objid
WHERE t0.tagid=512
AND t1.tagid=256
AND t2.tagid=128
I have no idea why this runs faster. It was inspired by the search code in the MusicBrainz server. Doing this in Postgres, I usually get a ~8-10x speedup over the HAVING COUNT(...)
solution.