MYSQL OR vs IN performance

后端 未结 14 822
一向
一向 2020-11-22 04:12

I am wondering if there is any difference in regards to performance between the following

SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)

SELECT ... FROM ..         


        
相关标签:
14条回答
  • 2020-11-22 04:34

    I think one explanation to sunseeker's observation is MySQL actually sort the values in the IN statement if they are all static values and using binary search, which is more efficient than the plain OR alternative. I can't remember where I've read that, but sunseeker's result seems to be a proof.

    0 讨论(0)
  • 2020-11-22 04:37

    The accepted answer doesn't explain the reason.

    Below are quoted from High Performance MySQL, 3rd Edition.

    In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

    0 讨论(0)
  • 2020-11-22 04:38

    Just when you thought it was safe...

    What is your value of eq_range_index_dive_limit? In particular, do you have more or fewer items in the IN clause?

    This will not include a Benchmark, but will peer into the inner workings a little. Let's use a tool to see what is going on -- Optimizer Trace.

    The query: SELECT * FROM canada WHERE id ...

    With an OR of 3 values, part of the trace looks like:

           "condition_processing": {
              "condition": "WHERE",
              "original_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(multiple equal(296172, `canada`.`id`) or multiple equal(295093, `canada`.`id`) or multiple equal(293626, `canada`.`id`))"
                },
    

    ...

                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "id",
                        "ranges": [
                          "293626 <= id <= 293626",
                          "295093 <= id <= 295093",
                          "296172 <= id <= 296172"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "chosen": true
    

    ...

            "refine_plan": [
              {
                "table": "`canada`",
                "pushed_index_condition": "((`canada`.`id` = 296172) or (`canada`.`id` = 295093) or (`canada`.`id` = 293626))",
                "table_condition_attached": null,
                "access_type": "range"
              }
            ]
    

    Note how ICP is being given ORs. This implies that OR is not turned into IN, and InnoDB will be performing a bunch of = tests through ICP. (I do not feel it is worth considering MyISAM.)

    (This is Percona's 5.6.22-71.0-log; id is a secondary index.)

    Now for IN() with a few values

    eq_range_index_dive_limit = 10; there are 8 values.

            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))"
                },
    

    ...

                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "id",
                        "ranges": [
                          "293626 <= id <= 293626",
                          "295093 <= id <= 295093",
                          "295573 <= id <= 295573",
                          "295588 <= id <= 295588",
                          "295810 <= id <= 295810",
                          "296127 <= id <= 296127",
                          "296172 <= id <= 296172",
                          "297148 <= id <= 297148"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "chosen": true
    

    ...

            "refine_plan": [
              {
                "table": "`canada`",
                "pushed_index_condition": "(`canada`.`id` in (296172,295093,293626,295573,297148,296127,295588,295810))",
                "table_condition_attached": null,
                "access_type": "range"
              }
            ]
    

    Note that the IN does not seem to be turned into OR.

    A side note: Notice that the constant values were sorted. This can be beneficial in two ways:

    • By jumping around less, there may be better caching, less I/O to get to all the values.
    • If two similar queries are coming from separate connections, and they are in transactions, there is a better chance of getting a delay instead of a deadlock due to overlapping lists.

    Finally, IN() with a lots of values

          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))"
                },
    

    ...

                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "id",
                        "ranges": [
                          "291752 <= id <= 291752",
                          "291839 <= id <= 291839",
                          ...
                          "297196 <= id <= 297196",
                          "297201 <= id <= 297201"
                        ],
                        "index_dives_for_eq_ranges": false,
                        "rows": 111,
                        "chosen": true
    

    ...

            "refine_plan": [
              {
                "table": "`canada`",
                "pushed_index_condition": "(`canada`.`id` in (293831,292259,292881,293440,292558,295792,292293,292593,294337,295430,295034,297060,293811,295587,294651,295559,293213,295742,292605,296018,294529,296711,293919,294732,294689,295540,293000,296916,294433,297112,293815,292522,296816,293320,293232,295369,291894,293700,291839,293049,292738,294895,294473,294023,294173,293019,291976,294923,294797,296958,294075,293450,296952,297185,295351,295736,296312,294330,292717,294638,294713,297176,295896,295137,296573,292236,294966,296642,296073,295903,293057,294628,292639,293803,294470,295353,297196,291752,296118,296964,296185,295338,295956,296064,295039,297201,297136,295206,295986,292172,294803,294480,294706,296975,296604,294493,293181,292526,293354,292374,292344,293744,294165,295082,296203,291918,295211,294289,294877,293120,295387))",
                "table_condition_attached": null,
                "access_type": "range"
              }
            ]
    

    Side note: I needed this due to the bulkiness of the trace:

    @@global.optimizer_trace_max_mem_size = 32222;
    
    0 讨论(0)
  • 2020-11-22 04:39

    I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.

    Do not believe people who give their "opinion", science is all about testing and evidence.

    I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):

    IN: 2.34969592094s

    OR: 5.83781504631s

    Update:
    (I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)

    In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.

    $t = microtime(true); 
    for($i=0; $i<10000; $i++):
    $q = DB::table('users')->where('id',1)
        ->orWhere('id',2)
        ->orWhere('id',3)
        ->orWhere('id',4)
        ->orWhere('id',5)
        ->orWhere('id',6)
        ->orWhere('id',7)
        ->orWhere('id',8)
        ->orWhere('id',9)
        ->orWhere('id',10)
        ->orWhere('id',11)
        ->orWhere('id',12)
        ->orWhere('id',13)
        ->orWhere('id',14)
        ->orWhere('id',15)
        ->orWhere('id',16)
        ->orWhere('id',17)
        ->orWhere('id',18)
        ->orWhere('id',19)
        ->orWhere('id',20)->get();
    endfor;
    $t2 = microtime(true); 
    echo $t."\n".$t2."\n".($t2-$t)."\n";
    

    1482080514.3635
    1482080517.3713
    3.0078368186951

    $t = microtime(true); 
    for($i=0; $i<10000; $i++): 
    $q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get(); 
    endfor; 
    $t2 = microtime(true); 
    echo $t."\n".$t2."\n".($t2-$t)."\n";
    

    1482080534.0185
    1482080536.178
    2.1595389842987

    0 讨论(0)
  • 2020-11-22 04:41

    I'll bet they are the same, you can run a test by doing the following:

    loop over the "in (1,2,3,4)" 500 times and see how long it takes. loop over the "=1 or =2 or=3..." version 500 times and seeing how long it runs.

    you could also try a join way, if someField is an index and your table is big it could be faster...

    SELECT ... 
        FROM ... 
            INNER JOIN (SELECT 1 as newField UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) dt ON someFIELD =newField
    

    I tried the join method above on my SQL Server and it is nearly the same as the in (1,2,3,4), and they both result in a clustered index seek. I'm not sure how MySQL will handle them.

    0 讨论(0)
  • 2020-11-22 04:41

    I know that, as long as you have an index on Field, the BETWEEN will use it to quickly find one end, then traverse to the other. This is most efficient.

    Every EXPLAIN I've seen shows "IN ( ... )" and " ... OR ..." to be interchangeable and equally (in)efficient. Which you would expect, since the optimizer has no way to know whether or not they comprise an interval. It's also equivalent to a UNION ALL SELECT on the individual values.

    0 讨论(0)
提交回复
热议问题