I have to migrate a large amount of existing data in a Postgres DB after a schema change.
In the old schema a country attribute would be stored in the users table. N
The table users
must have some primary key that you did not disclose. For the purpose of this answer I will name it users_id
.
You can solve this rather elegantly with data-modifying CTEs introduced with PostgreSQL 9.1:
If we can assume that country
is unique, the whole operation is rather trivial:
WITH i AS (
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL
RETURNING id, country
)
UPDATE users u
SET address_id = i.id
FROM i
WHERE i.country = u.country;
You mention version 8.3 in your question. If you did not get around to upgrade in the meantime, you might want to consider upgrading. End of life is coming soon for 8.3.
Be that as it may, this is simple enough with version 8.3. You just need two statements:
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE address_id IS NULL
AND a.country = u.country;
If country
is not unique, it becomes more challenging. You could just create one address and link to it multiple times. But you did mention a 1:1 relationship that rules out such a convenient solution.
For version 9.1:
WITH s AS (
SELECT users_id, country
, row_number() OVER (PARTITION BY country) AS rn
FROM users
WHERE address_id IS NULL
)
, i AS (
INSERT INTO addresses (country)
SELECT country
FROM s
RETURNING id, country
)
, r AS (
SELECT *
, row_number() OVER (PARTITION BY country) AS rn
FROM i
)
UPDATE users u
SET address_id = r.id
FROM r
JOIN s USING (country, rn) -- select exactly one id for every user
WHERE u.users_id = s.users_id
AND u.address_id IS NULL;
As there is no way to unambiguously assign exactly one id
returned from the INSERT
to every user in a set with identical country
, I use the window function row_number() to make them unique.
Not as straight forward with version 8.3. One possible way:
INSERT INTO addresses (country)
SELECT DISTINCT country -- pick just one per set of dupes
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE a.country = u.country
AND u.address_id IS NULL
AND NOT EXISTS (
SELECT * FROM addresses b
WHERE b.country = a.country
AND b.users_id < a.users_id
); -- effectively picking the smallest users_id per set of dupes
Repeat this until the last NULL
value is gone from users.address_id
.