Disclaimer: theoretical question.
Several questions here was asked about how to differentiate inserted and updated rows in the PostgreSQL upsert
stateme
I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.
In short: Your guess is right, and you can use the following RETURNING
clause to determine if the row was inserted and not updated:
RETURNING (xmax = 0) AS inserted
Now the detailed explanation:
When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.
xmax
serves two different purposes:
As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between xmin
and xmax
can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less than xmax
.
xmax
is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.
If only one transaction has a lock on the row, xmax
will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax
contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.
The documentation of xmax
is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask
of the tuple, which is not immediately visible via SQL.
You can install the contrib module pageinspect
to view this and other fields of a tuple.
I ran your example, and this is what I see when I use the heap_page_items
function to examine details (the transaction ID numbers are of course different in my case):
SELECT *, ctid, xmin, xmax FROM t;
┌───┬────┬───────┬────────┬────────┐
│ i │ x │ ctid │ xmin │ xmax │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │ 0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)
SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));
┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│ 1 │ 8160 │ 102507 │ 102508 │ (0,2) │ 500 │ 4002 │
│ 2 │ 8128 │ 102508 │ 102508 │ (0,2) │ 2190 │ 8002 │
│ 3 │ 8096 │ 102508 │ 0 │ (0,3) │ 900 │ 2 │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)
The meanings of t_infomask
and t_infomask2
can be found in src/include/access/htup_details.h
. lp_off
is the offset of the tuple data in the page, and t_ctid
is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.
Let me discuss the three rows returned by heap_page_items
.
At line pointer (lp
) 1 we find the old, updated tuple. It originally had ctid = (0,1)
, but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued the INSERT ... ON CONFLICT
). This tuple is not visible any more an will get removed during VACUUM
.
t_infomask
shows that both xmin
and xmax
belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2
shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT
).
At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT
(transaction 102508).
t_infomask
shows that this tuple is the result of an update, xmin
is valid, and xmax
contains a KEY SHARE
row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT
processing. t_infomask2
shows that this is a HOT tuple.
At line pointer 3 we see the newly inserted row.
t_infomask
shows that xmin
is valid and xmax
is invalid. xmax
is set to 0 because this value is always used for newly inserted tuples.
So the nonzero xmax
of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT
is reimplemented one day so that this behaviour changes, but I think that is unlikely.