问题
How can I speed up the PostgreSQL UPDATE FROM sql query below? It currently takes days to finish running.
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND (ip.status is null or ip.status != '6');
And why does it takes days to run in the first place?
Most of the time, I manually kill the query because it takes too long to run like more than 24 hours. Last time it successfully finished running, it took almost 38 hours.
import_parts
table has 971971
rows
parts.part_parts
table has 2196357
rows
parts.part_parts
table has an index on upc
and id
is the primary key of the table.
I already tried running VACUUM ANALYZE
on import_parts
table and parts.part_parts
table before the update query above runs but the query still takes too long to run, so I manually killed it after 30 minutes. I'm hoping to be able to run the query in under 30 minutes.
Here's the result of EXPLAIN when I run the query after running VACUUM ANALYZE
on import_parts
table and parts.part_parts
table:
UPDATE 1:
I also tried setting enable_nestloop
to off: SET enable_nestloop TO off
But the query still takes too long to run so I manually killed it. Here's the result of EXPLAIN
when enable_nestloop is turned off:
UPDATE 2:
Here's the result of EXPLAIN when using the query suggested by Abelisto on his answer to this post:
When I actually run the query though, I'm encountering this error:
ERROR: more than one row returned by a subquery used as an expression
I'm still figuring out how to fix the error.
回答1:
First of all, try to rewrite your query like
UPDATE import_parts ip
SET part_part_id = (
SELECT pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc)
WHERE status is null or status != '6';
Obviously it raises something like to
ERROR: more than one row returned by a subquery used as an expression
Fix it using additionally conditions (subquery should to return exactly one or zero row for each row in the target table)
回答2:
From what you say, it seems that upc
is not unique in parts_parts
. Try running this:
select upc, count(*)
from parts.parts_parts pp
group by upc
having count(*) > 1;
These duplicates are probably causing the performance problems. You could get around this by arbitrarily choosing a value, such as:
UPDATE import_parts ip
SET part_part_id = pp.id
FROM (SELECT pp.upc, MIN(pp.id) as id
FROM parts.part_parts pp
GROUP BY pp.upc
) pp
WHERE pp.upc = ip.upc AND (ip.status is null or ip.status <> '6');
回答3:
Create an index with in import_parts with columns: upc,status.
I will recomend you to split in two sentences:
I do't know your status, but i suppose you have status: null, 1, 2, 3, 4, 5, 6, 7
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND ip.status is null
;
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND ip.status IN(1, 2, 3, 4, 5, 7)
;
Of course you need to change 1, 2, 3, 4, 5, 7 for your values(different from 6)
I also like the answer of @Gordon Linoff, but it depends of how many rows do you have by upc
来源:https://stackoverflow.com/questions/62493528/how-can-i-speed-up-this-postgresql-update-from-sql-query-it-currently-takes-day