Phantom Read anomaly in Oracle and PostgreSQL does not rollback transaction

懵懂的女人 提交于 2019-12-02 19:43:18

I love this question because it demonstrates that the Phantom Read definition in the SQL Standard only pictures the effect without stating the root cause of this data anomaly:

P3 ("Phantom"): SQL-transaction T1 reads the set of rows N that satisfy some . SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same , it obtains a different collection of rows.

In the 1995 paper, A Critique of ANSI SQL Isolation Levels, Jim Gray and co, described Phantom Read as:

P3: r1[P]...w2[y in P]...(c1 or a1) (Phantom)

One important note is that ANSI SQL P3 only prohibits inserts (and updates, according to some interpretations) to a predicate whereas the definition of P3 above prohibits any write satisfying the predicate once the predicate has been read — the write could be an insert, update, or delete.

Therefore, a Phantom Read does not mean that you can simply return a snapshot as of the start of the currently running transaction and pretend that providing the same result for a query is going to protect you against the actual Phantom Read anomaly.

In the original SQL Server 2PL (Two-Phase Locking) implementation, returning the same result for a query implied Predicate Locks.

The MVCC (Multi-Version Concurrency Control) Snapshot Isolation (wrongly named Serializable in Oracle) does not actually prevent other transactions from inserting/deleting rows that match the same filtering criteria with a query that already executed and returned a result set in our current running transaction.

For this reason, we can imagine the following scenario in which we want to apply a raise to all employees:

  1. Tx1: SELECT SUM(salary) FROM employee where company_id = 1;
  2. Tx2: INSERT INTO employee (id, name, company_id, salary) VALUES (100, 'John Doe', 1, 100000);
  3. Tx1: UPDATE employee SET salary = salary * 1.1;
  4. Tx2: COMMIT;
  5. Tx1: COMMIT:

In this scenario, the CEO runs the first transaction (Tx1), so:

  1. She first checks the sum of all salaries in her company.
  2. Meanwhile, the HR department runs the second transaction (Tx2) as they have just managed to hire John Doe and gave him a 100k $ salary.
  3. The CEO decides that a 10% raise is feasible taking into account the total sum of salaries, being unaware that the salary sum has raised with 100k.
  4. Meanwhile, the HR transaction Tx2 is committed.
  5. The CEO transaction Tx1 is committed.

Boom! The CEO has taken a decision on an old snapshot, giving a raise that might not be sustained by the current updated salary budget.

You can view a detailed explanation of this use case (with lots of diagrams) in the following post.

Is this a Phantom Read or a Write Skew?

According to Jim Gray and co, this is a Phantom Read since the Write Skew is defined as:

A5B Write Skew Suppose T1 reads x and y, which are consistent with C(), and then a T2 reads x and y, writes x, and commits. Then T1 writes y. If there were a constraint between x and y, it might be violated. In terms of histories:

A5B: r1[x]...r2[y]...w1[y]...w2[x]...(c1 and c2 occur)

In Oracle, the Transaction Manager might or might not detect the anomaly above because it does not use predicate locks or index range locks (next-key locks), like MySQL.

PostgreSQL manages to catch this anomaly only if Bob issues a read against the employee table, otherwise, the phenomenon is not prevented.

UPDATE

Initially, I was assuming that Serializability would imply a time ordering as well. However, as very well explained by Peter Bailis, wall-clock ordering or Linearizability is only assumed for Strict Serializability.

Therefore, my assumptions were made for a Strict Serializable system. But that's not what Serializable is supposed to offer. The Serializable isolation model makes no guarantees about time, and operations are allowed to be reordered as long as they are equivalent to a some serial execution.

Therefore, according to the Serializable definition, such a Phantom Read can occur if the second transaction does not issue any read. But, in a Strict Serializable model, the one offered by 2PL, the Phantom Read would be prevented even if the second transaction does not issue a read against the same entries which we are trying to guard against phantom reads.

What you observe is not a phantom read. That would be if a new row would show up when the query is issued the second time (phantoms appear unexpectedly).

You are protected from phantom reads in both Oracle and PostgreSQL with SERIALIZABLE isolation.

The difference between Oracle and PostgreSQL is that SERIALIZABLE isolation level in Oracle only offers snapshot isolation (which is good enough to keep phantoms from appearing), while in PostgreSQL it will guarantee true serializability (i.e., there always exists a serialization of the SQL statements that leads to the same results). If you want to get the same thing in Oracle and PostgreSQL, use REPEATABLE READ isolation in PostgreSQL.

The Postgres documentation defines a phantom read as:

A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.

Because your select returns the same value both before and after the other transaction committed, it does not meet the criteria for a phantom read.

I just wanted to point that Vlad Mihalcea's answer is plain wrong.

Is this a Phantom Read or a Write Skew?

Neither of those -- there is no anomaly here, transactions are serializable as Tx1 -> Tx2.

SQL standard states: "A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions."

PostgreSQL manages to catch this anomaly only if Bob issues a read against the employee table, otherwise the phenomenon is not prevented.

PostgreSQL's behavior here is 100% correct, it just "flips" apparent transactions order.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!