Why are certain types of prepared queries using PDO in PHP with MySQL slow?

后端 未结 3 924
庸人自扰
庸人自扰 2020-12-04 20:10

When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than

相关标签:
3条回答
  • 2020-12-04 20:45

    Make sure you're telling PDO that the value is an integer not a string; if PDO puts it as a string, then MySQL will have to typecast the values for comparison. Depending on how it goes about this, it could cause major slowdowns by causing MySQL to avoid using an index.

    I'm not completely sure about the behaviour here, but I have had this problem with Postgres a few years back...

    0 讨论(0)
  • 2020-12-04 20:47

    Don't have any experience with PDO so can't help with that but this method is pretty performant, although it's a bit ugly in places ;)

    PHP

    <?php
    
    $nums = array(); $max = 10000;
    
    for($i=0;$i<$max*10;$i++) $nums[] = $i;
    
    $conn = new mysqli("127.0.0.1", "vldb_dbo", "pass", "vldb_db", 3306);
    
    $sql = sprintf("call list_products_by_id('%s',0)", implode(",",array_rand($nums, $max)));
    
    $startTime = microtime(true);
    
    $result = $conn->query($sql);
    
    echo sprintf("Fetched %d rows in %s secs<br/>", 
        $conn->affected_rows, number_format(microtime(true) - $startTime, 6, ".", ""));
    
    $result->close();
    $conn->close();
    
    ?>
    

    Results

    select count(*) from product;
    count(*)
    ========
    1000000
    
    Fetched 1000 rows in 0.014767 secs
    Fetched 1000 rows in 0.014629 secs
    
    Fetched 2000 rows in 0.027938 secs
    Fetched 2000 rows in 0.027929 secs
    
    Fetched 5000 rows in 0.068841 secs
    Fetched 5000 rows in 0.067844 secs
    
    Fetched 7000 rows in 0.095199 secs
    Fetched 7000 rows in 0.095184 secs
    
    Fetched 10000 rows in 0.138205 secs
    Fetched 10000 rows in 0.134356 secs
    

    MySQL

    drop procedure if exists list_products_by_id;
    
    delimiter #
    
    create procedure list_products_by_id
    (
    in p_prod_id_csv text,
    in p_show_explain tinyint unsigned
    )
    proc_main:begin
    
    declare v_id varchar(10);
    declare v_done tinyint unsigned default 0;
    declare v_idx int unsigned default 1;
    
        create temporary table tmp(prod_id int unsigned not null)engine=memory; 
    
        -- split the string into tokens and put into a temp table...
    
        if p_prod_id_csv is not null then
            while not v_done do
                set v_id = trim(substring(p_prod_id_csv, v_idx, 
                    if(locate(',', p_prod_id_csv, v_idx) > 0, 
                            locate(',', p_prod_id_csv, v_idx) - v_idx, length(p_prod_id_csv))));
    
                    if length(v_id) > 0 then
                    set v_idx = v_idx + length(v_id) + 1;
                            insert ignore into tmp values(v_id);
                    else
                    set v_done = 1;
                    end if;
            end while;
        end if;
    
        if p_show_explain then
    
            select count(*) as count_of_tmp from tmp;
    
            explain
            select p.* from product p
            inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;
    
        end if;
    
        select p.* from product p
            inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;
    
        drop temporary table if exists tmp;
    
    end proc_main #
    
    delimiter ;
    
    0 讨论(0)
  • 2020-12-04 21:00

    There are some major mistakes on the sample code. So to be more precise.

    // $imageIds is an array with 10K keys
    $keyCount = count($imageIds);
    $keys = implode(', ', array_fill(0, $keyCount, '?'));
    $query = "SELECT * FROM images WHERE ImageID IN ({$keys})";
    

    so far the above code will provide something like this...

    SELECT * FROM images WHERE ImageID IN (?, ?, ?, ?, ?, ?,...?, ?, ?, ?)
    

    There is no loop for binding... There should be a small loop in which you would bind all of the parameters being passed to MySQL. You go from prepare to execute. When correct binding is primarily what you want.

    $stmt = $dbh->prepare($query);
    $stmt->execute($imageIds);
    // until now, it's been fast.  fetch() is the slow part
    while ($row = $stmt->fetch()) {
        $rows[] = $row;
    }
    

    Now i have a simple logic question on this part of the question...

    When using SELECT * FROM table WHERE Id IN ( .. ) queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.

    Would it not be better if the same query was re-written so that you would not need to pass 10000 keys as parameters?

    PDO and MySQLi do not have major differences in timings. Bad written queries do. Very complex Stored Procedures sometimes might turn out slow if they are not well optimized.

    Check if another query could fetch the desired result. For example

    Create a small table named test

    create table `test` (
      `id` int(10) not null,
      `desc` varchar(255)
      ); 
    insert into `test` (`id`,`desc`) values (1,'a'),(10,'a1'),(11,'a2'),(12,'a3'),(13,'a4'),(14,'a5'),(15,'a6'),(2,'ab'),(20,'ab1'),(21,'ab2'),(22,'ab3'),(23,'ab4'),(24,'ab5'),(25,'ab6');
    

    Run those simple queries

    select * from `test` where `id` rlike '^1$';
    select * from `test` where `id` rlike '^1+';
    select * from `test` where `id`=1;
    select * from `test` where `id` rlike '^1.$';
    select * from `test` where `id` rlike '.2$';
    select * from `test` where `id` rlike '^2$';
    select * from `test` where `id` rlike '.(2|3)'; // Slower
    select * from `test` where `id` IN (12,13,22,23); // Faster
    select * from `test` where `id` IN ('12,13,22,23'); // Wrong result
    select * from `test` where `id` IN ('12','13','22','23'); // Slower
    

    The last 4 queries have the same result in this example. I think that most of the times if you check it on SQLFiddle you would get query times that correspond to label that they have been given.

    0 讨论(0)
提交回复
热议问题