I have list of products (3 million items) without IDs - only titles. But I don\'t know which titles already exist in DB. New products (about 2.9 million items) must be added
COPY everything to a temporary staging table and insert only new titles into your target table.
CREATE TEMP TABLE tmp(title text);
COPY tmp FROM 'path/to/file.csv';
ANALYZE tmp;
INSERT INTO tbl
SELECT DISTINCT tmp.title
FROM tmp
LEFT JOIN tbl USING (title)
WHERE tbl.title IS NULL;
IDs should be generated automatically with a serial column tbl_id
in tbl
.
The LEFT JOIN
/ IS NULL
construct disqualifies already existing titles. NOT EXISTS
would be another possibility.
DISTINCT
prevents duplicates in the incoming data in the temporary table tmp
.
ANALYZE is useful to make sure the query planner picks a sensible plan, and temporary tables are not analyzed by autovacuum.
Since you have 3 million items, it might pay to raise the setting for temp_buffer
(for this session only):
SET temp_buffers = 1000MB;
Or however much you can afford and is enough to hold the temp table in RAM, which is much faster. Note: must be done first in the session - before any temp objects are created.
To see all IDs for the imported data:
SELECT tbl.tbl_id, tbl.title
FROM tbl
JOIN tmp USING (title)
In the same session! A temporary table is dropped automatically at the end of the session.