Given m2m relation: items-categories I have three tables:
A JOIN
is more efficient, generally speaking.
However, one thing to be aware of is that joins can produce duplicate rows in your output. For example, if item id was in category 1 and 3, the first JOIN
would result in two rows for id 123. If item id 999 was in categories 1,3,7,8,12, and 66, you would get eight rows for 999 in your results (2*2*2).
Duplicate rows are something you need to be aware of and handle. In this case, you could just use select distinct id...
. Eliminating duplicates can get more complicated with a complex query, though.
You are using Join in Option A and subquery in Option B. The difference is:
In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.
In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.
The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.
OPTION A
JOIN
has an advantage over EXIST
, because it will more efficiently use the indices, especially in case of large tables
select distinct `user_posts_id` from `user_posts_boxes`
where `user_id` = 5
and
exists (select * from `box` where `user_posts_boxes`.`box_id` = `box`.`id`
and `status` in ("A","F"))
order by `user_posts_id` desc limit 200;
select distinct `user_posts_id` from `user_posts_boxes`
INNER JOIN box on box.id = `user_posts_boxes`.`box_id` and box.`status` in ("A","F")
and box.user_id = 5
order by `user_posts_id` desc limit 200
I tried with both query, But above query works faster for me.Both tables having large dataset. Almost "user_posts_boxes" has 4 million and boxes are 1.5 million.
First query took = 0.147 ms 2nd Query almost = 0.5 to 0.9 MS
But my database tables are inno db and having physical relationships are also applied.
SO I should go for exists but it also depends upon how you have your db structure.