问题
Description
I have 2 tables with the following structure (irrelevant columns removed):
mysql> explain parts;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| code | varchar(32) | NO | PRI | NULL | |
| slug | varchar(255) | YES | | NULL | |
| title | varchar(64) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
and
mysql> explain details;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| sku | varchar(32) | NO | PRI | NULL | |
| description | varchar(700) | YES | | NULL | |
| part_code | varchar(32) | NO | PRI | | |
+-------------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
Table parts
contains 184147 rows, and details
contains 7278870 rows.
The part_code
column from details
represents the code
column from the parts
table.
Since these columns are varchar
, I want to add the column id int(11)
to parts
, and part_id int(11)
to details
. I tried this:
mysql> alter table parts drop primary key;
Query OK, 184147 rows affected (0.66 sec)
Records: 184147 Duplicates: 0 Warnings: 0
mysql> alter table parts add column
id int(11) not null auto_increment primary key first;
Query OK, 184147 rows affected (0.55 sec)
Records: 184147 Duplicates: 0 Warnings: 0
mysql> select id, code from parts limit 5;
+----+-------------------------+
| id | code |
+----+-------------------------+
| 1 | Yhk0KqSMeLcfH1KEfykihQ2 |
| 2 | IMl4iweZdmrBGvSUCtMCJA2 |
| 3 | rAKZUDj1WOnbkX_8S8mNbw2 |
| 4 | rV09rJ3X33-MPiNRcPTAwA2 |
| 5 | LPyIa_M_TOZ8655u1Ls5mA2 |
+----+-------------------------+
5 rows in set (0.00 sec)
So now I have the id column with correct data in parts
table. After adding part_id
column to details
table:
mysql> alter table details add column part_id int(11) not null after part_code;
Query OK, 7278870 rows affected (1 min 17.74 sec)
Records: 7278870 Duplicates: 0 Warnings: 0
Now the big problem is how to update part_id
accordingly? The following query:
mysql> update details d
join parts p on d.part_code = p.code
set d.part_id = p.id;
was running for about 30 hours until I killed it.
Note that both tables are MyISAM:
mysql> select engine from information_schema.tables where table_schema = 'db_name' and (table_name = 'parts' or table_name = 'details');
+--------+
| ENGINE |
+--------+
| MyISAM |
| MyISAM |
+--------+
2 rows in set (0.01 sec)
I just now realized that one of the problems was that dropping the key on parts
table I dropped the index on the code
column. On the other side, I have the following indexes on details
table (some irrelevant columns are omitted):
mysql> show indexes from details;
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Index_type |
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
| details | 0 | PRIMARY | 1 | sku | A | NULL | BTREE |
| details | 0 | PRIMARY | 3 | part_code | A | 7278870 | BTREE |
+---------+------------+----------+--------------+-------------+-----------+-------------+------------+
2 rows in set (0.00 sec)
My questions are:
- Is the update query OK or it can be optimized somehow?
- I will add the index on the
code
column inparts
table, will the query run in a reasonable time, or it will run for days again? - How can I make a (sql/bash/php) script so I can see the progress of the query execution?
Thank you very much!
回答1:
As I mentioned in the question, I forgot about the dropped indexes on the parts
table, so I added them:
alter table parts add key code (code);
Inspired by Puggan Se's answer, I tried to use a LIMIT
on UPDATE
in a PHP script, but LIMIT
can't be used with an UPDATE
with JOIN
in MySQL. To limit the query I added a new column to the details
table:
# drop the primary key,
alter table details drop primary key;
# so I can create an auto_increment column
alter table details add id int not null auto_increment primary key;
# alter the id column and remove the auto_increment
alter table details change id id int not null;
# drop again the primary key
alter table details drop primary key;
# add new indexes
alter table details add primary key ( id, sku, num, part_code );
Now I can use the "limit":
update details d
join parts p on d.part_code = p.code
set d.part_id = p.id
where d.id between 1 and 5000;
So here's the full PHP script:
$started = time();
$i = 0;
$total = 7278870;
echo "Started at " . date('H:i:s', $started) . PHP_EOL;
function timef($s){
$h = round($s / 3600);
$h = str_pad($h, 2, '0', STR_PAD_LEFT);
$s = $s % 3600;
$m = round( $s / 60);
$m = str_pad($m, 2, '0', STR_PAD_LEFT);
$s = $s % 60;
$s = str_pad($s, 2, '0', STR_PAD_LEFT);
return "$h:$m:$s";
}
while (1){
$i++;
$j = $i * 5000;
$k = $j + 4999;
$result = mysql_query("
update details d
join parts p on d.part_code = p.code
set d.part_id = p.id
where d.id between $j and $k
");
if(!$result) die(mysql_error());
if(mysql_affected_rows() == 0) die(PHP_EOL . 'Done!');
$p = round(($i * 5000) / $total, 4) * 100;
$s = time() - $started;
$ela = timef($s);
$eta = timef( (( $s / $p ) * 100) - $s );
$eq = floor($p/10);
$show_gt = ($p == 100);
$spaces = $show_gt ? 9 - $eq : 10 - $eq;
echo "\r {$p}% | [" . str_repeat('=', $eq) . ( $show_gt ? '' : '>' ) . str_repeat(' ', $spaces) . "] | Elapsed: ${ela} | ETA: ${eta}";
}
And here's a screenshot:
As you can see, the whole thing took less than 5 minutes :) Thank you all!
P.S.: There's still a bug because I found later 4999 rows left with part_id = 0
, but I did that manually already.
回答2:
You may want to add a where and a limit, so you can update it in chunks
update details d join parts p on d.part_code = p.code set d.part_id = p.id WHERE d.part_id =0 LIMIT 5000;
it will be alot faster whit index, and if you do one query as sugesten in '1' above, you can se how long 5000 rows takes to handle
loop above query
while(TRUE) { $result = mysql_query($query); if(!$result) die('Failed: ' . mysql_error()); if(mysql_affected_rows() == 0) die('Done'); echo '.'; }
EDIT 1 rewrite the query do to limit error on joins
You can use a subquery to avoid the multiple tables update:
UPDATE details
SET part_id = (SELECT id FROM parts WHERE parts.code = details.part_code)
WHERE part_id = 0
LIMIT 5000;
回答3:
You can try to remove the indexes form the table you're trying to update. MySQL recreates the indexes on each row update. It won't be blazing fast for 7M records.
来源:https://stackoverflow.com/questions/11430362/update-column-from-another-table-in-large-mysql-db-7-million-rows