Why does Aries perform a redo before undo in database management recovery?

Why does Aries algorithm apply a redo before an undo if it already knows what transactions to undo after the analysis phase?

I know(think) it has something to do with the Lsn numbers and maintaining consistency in the sense that undoing a transaction given that the data flushed on disk may not be the same as undoing a transaction at the time of the crash (due to dirty pages), but I can't find any sort of 'formal' answer to this question (at least one that I can understand).

Because there may be unflushed pages on the buffer even if a transaction is committed. ARIES uses no-force in the buffer manager. Redoing brings the transaction table and dirty page table to the state that was at the time of the crash. As a result, successful transactions can be reflected to the stable storage.

Short answer:

We need to repeat all the history crash in the redo pass in order to ensure the database consistency before performing the undo pass.

Long answer:

The recovery algorithm ARIES, in order to ensure the atomicity and the durability properties of the DBMS, performs 3 passes:

Analysis pass: to see what needs to be done (plays log forward)
Redo pass: to make sure the disk reﬂects any updates that are in the log but not on disk including those that belong to transactions that will eventually be rolled back. This way it ensures we are in consistent state, which will allow logical undo.
Undo pass: to remove the actions of any losing transactions

The UNDO data log is logical, while the REDO data log is physical:

We must do physical REDO, since we can't guarantee that the database is in a consistent state (so, e.g., logging "INSERT VALUE X INTO TABLE Y" might not be a good idea, because X might be reﬂected in an index but not the table, or vice versa, in case a crash happens while inserting)
We can do logical UNDO, because after REDO we know that things are consistent. In fact, we must do logical UNDO because we only UNDO some actions, and physical logging of UNDOs of the form, e.g., "split page x of index y" might not be the right thing to do anymore in terms of index management or invariant maintenance. We don't have to worry about this during redo because we repeat history and replay everything, which means any physical modiﬁcations made to the database last time will still be correct.

Source

No idea what aries is, but assuming it is the same that other databases do:

Starting from some base backup redo logs are applied, which basically means all the data changing statements that happened after the backup but before the crash get applied. Without that you would lose everything that happens since the last backup.

When that is finished all incomplete transactions get rolled back because there is nobody who could pick up those transactions to complete them.

You want to get back to the state at failure in order to be accurate on which transactions need to be undone. One example which come to mind is successive failures. Precisely failures when recovering from crashes. During recovery you write your actions on the log. If you fail during recovery the process, you will REDO all the operations in the log (even the UNDO operations written during the last attempt!!).

It provides a simple algorithm, because you don't have to handle special cases and special cases of special cases. There is a guarantee that after any amount of crashes during recovery, we will go back to the same state as if there was no crash during recovery.

if you don't support record-level lock, then you can use selective-redo which only redo winner transaction. otherwise, it is better to repeat history(redo all) before undo

You can consider what is really done during redo and undo. Redo is repeating history, according to exited logs. Undo, in contrast, is create new CLR log records. When system crash, the log has records about uncommited xacts. If you donnot undo them, there will not be CLR log records, thus causing inconsistency.

One of the goals of ARIES is simplicity. While the undo after redo might not be necessary, it makes the correctness of the algorithm more apparent than a more complex scheme that would do an undo before a redo.

Besides to make sure database is consistent and disk is exactly the same as before crash happens (as Franck Dernoncourt answered), another benefit of performing redo before undo is that:

Failure may happen during recovery. Redo advances the progress of the whole "incremental recovery", namely, if failure happens during redo or undo, next recovery can pick up what previous recovery (redo) has left and continue, if redo is performed before undo.

An extreme case is, if undo performs before redo, and failure happens again during undo and again, all undo will become in vain.

来源：https://stackoverflow.com/questions/10289170/why-does-aries-perform-a-redo-before-undo-in-database-management-recovery

标签

database

recovery

aries