When using SELECT * FROM table WHERE Id IN ( .. )
queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than
Make sure you're telling PDO that the value is an integer not a string; if PDO puts it as a string, then MySQL will have to typecast the values for comparison. Depending on how it goes about this, it could cause major slowdowns by causing MySQL to avoid using an index.
I'm not completely sure about the behaviour here, but I have had this problem with Postgres a few years back...
Don't have any experience with PDO so can't help with that but this method is pretty performant, although it's a bit ugly in places ;)
<?php
$nums = array(); $max = 10000;
for($i=0;$i<$max*10;$i++) $nums[] = $i;
$conn = new mysqli("127.0.0.1", "vldb_dbo", "pass", "vldb_db", 3306);
$sql = sprintf("call list_products_by_id('%s',0)", implode(",",array_rand($nums, $max)));
$startTime = microtime(true);
$result = $conn->query($sql);
echo sprintf("Fetched %d rows in %s secs<br/>",
$conn->affected_rows, number_format(microtime(true) - $startTime, 6, ".", ""));
$result->close();
$conn->close();
?>
select count(*) from product;
count(*)
========
1000000
Fetched 1000 rows in 0.014767 secs
Fetched 1000 rows in 0.014629 secs
Fetched 2000 rows in 0.027938 secs
Fetched 2000 rows in 0.027929 secs
Fetched 5000 rows in 0.068841 secs
Fetched 5000 rows in 0.067844 secs
Fetched 7000 rows in 0.095199 secs
Fetched 7000 rows in 0.095184 secs
Fetched 10000 rows in 0.138205 secs
Fetched 10000 rows in 0.134356 secs
drop procedure if exists list_products_by_id;
delimiter #
create procedure list_products_by_id
(
in p_prod_id_csv text,
in p_show_explain tinyint unsigned
)
proc_main:begin
declare v_id varchar(10);
declare v_done tinyint unsigned default 0;
declare v_idx int unsigned default 1;
create temporary table tmp(prod_id int unsigned not null)engine=memory;
-- split the string into tokens and put into a temp table...
if p_prod_id_csv is not null then
while not v_done do
set v_id = trim(substring(p_prod_id_csv, v_idx,
if(locate(',', p_prod_id_csv, v_idx) > 0,
locate(',', p_prod_id_csv, v_idx) - v_idx, length(p_prod_id_csv))));
if length(v_id) > 0 then
set v_idx = v_idx + length(v_id) + 1;
insert ignore into tmp values(v_id);
else
set v_done = 1;
end if;
end while;
end if;
if p_show_explain then
select count(*) as count_of_tmp from tmp;
explain
select p.* from product p
inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;
end if;
select p.* from product p
inner join tmp on tmp.prod_id = p.prod_id order by p.prod_id;
drop temporary table if exists tmp;
end proc_main #
delimiter ;
There are some major mistakes on the sample code. So to be more precise.
// $imageIds is an array with 10K keys
$keyCount = count($imageIds);
$keys = implode(', ', array_fill(0, $keyCount, '?'));
$query = "SELECT * FROM images WHERE ImageID IN ({$keys})";
so far the above code will provide something like this...
SELECT * FROM images WHERE ImageID IN (?, ?, ?, ?, ?, ?,...?, ?, ?, ?)
There is no loop for binding... There should be a small loop in which you would bind all of the parameters being passed to MySQL. You go from prepare
to execute
. When correct binding is primarily what you want.
$stmt = $dbh->prepare($query);
$stmt->execute($imageIds);
// until now, it's been fast. fetch() is the slow part
while ($row = $stmt->fetch()) {
$rows[] = $row;
}
Now i have a simple logic question on this part of the question...
When using
SELECT * FROM table WHERE Id IN ( .. )
queries with more than 10000 keys using PDO with prepare()/execute(), the performance degrades ~10X more than doing the same query using mysqli with prepared statements or PDO without using prepared statements.
Would it not be better if the same query was re-written so that you would not need to pass 10000 keys as parameters?
PDO
and MySQLi
do not have major differences in timings. Bad written queries do. Very complex Stored Procedures sometimes might turn out slow if they are not well optimized.
Check if another query could fetch the desired result. For example
Create a small table named test
create table `test` (
`id` int(10) not null,
`desc` varchar(255)
);
insert into `test` (`id`,`desc`) values (1,'a'),(10,'a1'),(11,'a2'),(12,'a3'),(13,'a4'),(14,'a5'),(15,'a6'),(2,'ab'),(20,'ab1'),(21,'ab2'),(22,'ab3'),(23,'ab4'),(24,'ab5'),(25,'ab6');
Run those simple queries
select * from `test` where `id` rlike '^1$';
select * from `test` where `id` rlike '^1+';
select * from `test` where `id`=1;
select * from `test` where `id` rlike '^1.$';
select * from `test` where `id` rlike '.2$';
select * from `test` where `id` rlike '^2$';
select * from `test` where `id` rlike '.(2|3)'; // Slower
select * from `test` where `id` IN (12,13,22,23); // Faster
select * from `test` where `id` IN ('12,13,22,23'); // Wrong result
select * from `test` where `id` IN ('12','13','22','23'); // Slower
The last 4 queries have the same result in this example. I think that most of the times if you check it on SQLFiddle you would get query times that correspond to label that they have been given.