I am wondering if there is any difference in regards to performance between the following
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
SELECT ... FROM ..
I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.
The accepted answer doesn't explain the reason.
Below are quoted from High Performance MySQL, 3rd Edition.
In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)
Just when you thought it was safe...
What is your value of eq_range_index_dive_limit
? In particular, do you have more or fewer items in the IN
clause?
This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.
The query: SELECT * FROM canada WHERE id ...
With an OR
of 3 values, part of the trace looks like:
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"296172 <= id <= 296172"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note how ICP is being given ORs
. This implies that OR
is not turned into IN
, and InnoDB will be performing a bunch of =
tests through ICP. (I do not feel it is worth considering MyISAM.)
(This is Percona's 5.6.22-71.0-log; id
is a secondary index.)
Now for IN() with a few values
eq_range_index_dive_limit
= 10; there are 8 values.
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"293626 <= id <= 293626",
"295093 <= id <= 295093",
"295573 <= id <= 295573",
"295588 <= id <= 295588",
"295810 <= id <= 295810",
"296127 <= id <= 296127",
"296172 <= id <= 296172",
"297148 <= id <= 297148"
],
"index_dives_for_eq_ranges": true,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
"table_condition_attached": null,
"access_type": "range"
}
]
Note that the IN
does not seem to be turned into OR
.
A side note: Notice that the constant values were sorted. This can be beneficial in two ways:
Finally, IN() with a lots of values
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
},
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "id",
"ranges": [
"291752 <= id <= 291752",
"291839 <= id <= 291839",
...
"297196 <= id <= 297196",
"297201 <= id <= 297201"
],
"index_dives_for_eq_ranges": false,
"rows": 111,
"chosen": true
...
"refine_plan": [
{
"table": "`canada`",
"pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
"table_condition_attached": null,
"access_type": "range"
}
]
Side note: I needed this due to the bulkiness of the trace:
@@global.optimizer_trace_max_mem_size = 32222;
I needed to know this for sure, so I benchmarked both methods. I consistenly found IN
to be much faster than using OR
.
Do not believe people who give their "opinion", science is all about testing and evidence.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache
):
IN
: 2.34969592094s
OR
: 5.83781504631s
Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)
In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
->orWhere('id',2)
->orWhere('id',3)
->orWhere('id',4)
->orWhere('id',5)
->orWhere('id',6)
->orWhere('id',7)
->orWhere('id',8)
->orWhere('id',9)
->orWhere('id',10)
->orWhere('id',11)
->orWhere('id',12)
->orWhere('id',13)
->orWhere('id',14)
->orWhere('id',15)
->orWhere('id',16)
->orWhere('id',17)
->orWhere('id',18)
->orWhere('id',19)
->orWhere('id',20)->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080514.3635
1482080517.3713
3.0078368186951
$t = microtime(true);
for($i=0; $i<10000; $i++):
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get();
endfor;
$t2 = microtime(true);
echo $t."\n".$t2."\n".($t2-$t)."\n";
1482080534.0185
1482080536.178
2.1595389842987
I'll bet they are the same, you can run a test by doing the following:
loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.
you could also try a join way, if someField is an index and your table is big it could be faster...
SELECT ...
FROM ...
INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.
I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.
Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.