I have a problem when I try to update many rows at the same time.
Here is the table and query I use (simplified for better reading):
table<
Your script will create a temporary table from foo. It will have the same data types as foo. Use an impossible condition so it is empty:
select x, y, pkid
into temp t
from foo
where pkid = -1
Make your script to insert into it:
insert into t (x, y, pkid) values
(null, 20, 1),
(null, 50, 2)
Now update from it:
update foo
set x=t.x, y=t.y
from t
where foo.pkid=t.pkid
Finally drop it:
drop table t
With a standalone VALUES
expression PostgreSQL has no idea what the data types should be. With simple numeric literals the system is happy to assume matching types. But with other input (like NULL
) you would need to cast explicitly - as you already have found out.
You can query pg_catalog
(fast, but PostgreSQL-specific) or the information_schema
(slow, but standard SQL) to find out and prepare your statement with appropriate types.
Or you can use one of these simple "tricks" (I saved the best for last):
LIMIT 0
, append rows with UNION ALL VALUES
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL
VALUES
(1, 20, NULL) -- no type casts here
, (2, 50, NULL)
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;
The first sub-select of the subquery:
(SELECT x, y, pkid FROM foo LIMIT 0)
gets names and types for the columns, but LIMIT 0
prevents it from adding an actual row. Subsequent rows are coerced to the now well-defined row type - and checked immediately whether they match the type. Should be a subtle additional improvement over your original form.
While providing values for all columns of the table this short syntax can be used for the first row:
(TABLE foo LIMIT 0)
Major limitation: Postgres casts the input literals of the free-standing VALUES
expression to a "best-effort" type immediately. When it later tries to cast to the given types of the first SELECT
, it may already be too late for some types if there is no registered assignment cast between the assumed type and the target type. Examples: text
-> timestamp
or text
-> json
.
Pro:
Con:
LIMIT 0
, append rows with UNION ALL SELECT
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL SELECT 1, 20, NULL
UNION ALL SELECT 2, 50, NULL
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;
Pro:
Con:
UNION ALL SELECT
is slower than VALUES
expression for long lists of rows, as you found in your test.VALUES
expression with per-column type...
FROM (
VALUES
((SELECT pkid FROM foo LIMIT 0)
, (SELECT x FROM foo LIMIT 0)
, (SELECT y FROM foo LIMIT 0)) -- get type for each col individually
, (1, 20, NULL)
, (2, 50, NULL)
) t (pkid, x, y) -- columns names not defined yet, only types.
...
Contrary to 0. this avoids premature type resolution.
The first row in the VALUES
expression is a row of NULL
values which defines the type for all subsequent rows. This leading noise row is filtered by WHERE f.pkid = t.pkid
later, so it never sees the light of day. For other purposes you can eliminate the added first row with OFFSET 1
in a subquery.
Pro:
Con:
VALUES
expression with row typeUPDATE foo f
SET x = (t.r).x -- parenthesis needed to make syntax unambiguous
, y = (t.r).y
FROM (
VALUES
('(1,20,)'::foo) -- columns need to be in default order of table
,('(2,50,)') -- nothing after the last comma for NULL
) t (r) -- column name for row type
WHERE f.pkid = (t.r).pkid;
You obviously know the table name. If you also know the number of columns and their order you can work with this.
For every table in PostgreSQL a row type is registered automatically. If you match the number of columns in your expression, you can cast to the row type of the table ('(1,50,)'::foo
) thereby assigning column types implicitly. Put nothing behind a comma to enter a NULL
value. Add a comma for every irrelevant trailing column.
In the next step you can access individual columns with the demonstrated syntax. More about Field Selection in the manual.
Or you could add a row of NULL values and use uniform syntax for actual data:
...
VALUES
((NULL::foo)) -- row of NULL values
, ('(1,20,)') -- uniform ROW value syntax for all
, ('(2,50,)')
...
Pro:
Con:
VALUES
expression with decomposed row typeLike 3., but with decomposed rows in standard syntax:
UPDATE foo f
SET x = t.x
, y = t.y
FROM (
VALUES
(('(1,20,)'::foo).*) -- decomposed row of values
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid; -- eliminates 1st row with NULL values
Or, with a leading row of NULL values again:
...
VALUES
((NULL::foo).*) -- row of NULL values
, (1, 20, NULL) -- uniform syntax for all
, (2, 50, NULL)
...
Pros and cons like 3., but with more commonly known syntax.
And you need to spell out column names (if you need them).
VALUES
expression with types fetched from row typeLike Unril commented, we can combine the virtues of 2. and 4. to provide only a subset of columns:
UPDATE foo f
SET ( x, y)
= (t.x, t.y) -- short notation, see below
FROM (
VALUES
((NULL::foo).pkid, (NULL::foo).x, (NULL::foo).y) -- subset of columns
, (1, 20, NULL)
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid;
Pros and cons like 4., but we can work with any subset of columns and don't have to know the full list.
Also displaying short syntax for the UPDATE
itself that's convenient for cases with many columns. Related:
4. and 5. are my favorites.
db<>fiddle here - demonstrating all
If you have a script generating the query you could extract and cache the data type of each column an create the type cast accordingly. E.g:
SELECT column_name,data_type,udt_name
FROM information_schema.columns
WHERE table_name = 'foo';
From this udt_name you'll get the necessary cast as you explained in the last paragraph. Additionally you could do this:
UPDATE foo
SET x = t.x
FROM (VALUES(null::int4,756),(null::int4,6300))
AS t(x,pkid)
WHERE foo.pkid = t.pkid;