问题
I have a post/tag database, with the usual post, tag, and tag_post tables. The tag_post table contains tagid and postid fields.
I need to query posts. When I want to fetch posts that have a certain tag, I have to use a join:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid = {required_tagid}`
When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).
Now, I need to query posts that do not have a certain tag. Without much thought, I just changed the =
to !=
:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid != {certain_tagid}`
Boom! Wrong logic!
I did come up with this - just writing the logic here:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.postid NOT IN
(SELECT postid from tag_post where tagid = {certain_tagid})
I know this will work, but due to the way I've been brought up, I feel guilty (justified or not) whenever I write a query with a subquery.
Suggest a better way to do this?
回答1:
You can think of it as "find all rows in posts that do not have a match in tags (for a specific tag)"
This is the textbook use case for a LEFT JOIN.
LEFT JOIN tag_post ON post.id = tag_post.postid AND tag_post.tagid = {required_tagid}
WHERE tag_post.tag_id IS NULL
Note that you have to have the tag id in the ON clause of the join.
For a reference on join types, see here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
回答2:
In addition to Gavin Towey's good answer, you can use a not exists
subquery:
where not exists
(
select *
from tag_post
where post.id = tag_post.postid
and tag_post.tagid = {required_tagid}
)
The database typically executes both variants in the same way. I personally find the not exists
approach easier to read.
回答3:
When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).
There are other ways.
One can obtain all the
id
of all posts that are tagged with bothtagid
123 and 456 by grouping filteringtag_post
for only those tags, grouping by post and then dropping any groups that contain fewer tags than expected; then one can use the result to filter theposts
table:SELECT * FROM posts WHERE id IN ( SELECT postid FROM tag_post WHERE tagid IN (123,456) GROUP BY postid HAVING COUNT(*) = 2 )
If a post can be tagged with the same
tagid
multiple times, you will need to replaceCOUNT(*)
with the less performantCOUNT(DISTINCT tagid)
.Now, I need to query posts that do not have a certain tag.
This is known as an anti-join. The easiest way is to replace
IN
from the query above withNOT IN
, as you proposed. I wouldn't feel too guilty about it. The alternative is to use an outer join, as proposed in @GavinTowey's answer.
来源:https://stackoverflow.com/questions/12439820/select-a-post-that-does-not-have-a-particular-tag