Crash Recovery and the ARIES Algorithm
Ed Knorr, Slides from Fall 2020, CPSC 404
Based on most of Chapter 18 in: Ramakrishnan & Gehrke (textbook)
1
Learning Goals
Explain the purpose of write-ahead logging.
Explain the purpose of these types of log records in the ARIES
algorithm: update, commit, abort, end, and compensation.
Explain the purpose of the Transaction Table and the Dirty Page Table in the ARIES approach to crash recovery.
Given a short sequence of transactions (some of which are completed and some of which are in-flight) and their log records, apply (by hand) the ARIES algorithm for crash recovery. Identify the transactions that need to be aborted.
Argue for how often recovery checkpoints should be taken by a DBMS.
Explain the importance of backup and recovery strategies for a DB. (See also the latter part of Chapter 17’s slides on Transaction Management.)
2
ARIES Algorithm
ARIES = Algorithms for Recovery and Isolation Exploiting Semantics
Main designer = C. Mohan of IBM
C. Mohan retired in 2020 (honoured at SIGMOD 2020)
after a long and distinguished database career at IBM.
ARIES won a 10-year DB test-of-time award at VLDB 1998 in New York.
ARIES is used by numerous DBMSs including DB2 and SQL Server.
Photo: ithistory.org
3
Desired Behavior after system restarts:
crash!
Crash Recovery, ARIES
Motivation (recall ACID properties) – Atomicity
– Durability
T1 – T1, T2, & T3 should be T2 durable. T3 T4 T5
– T4 & T5 should be aborted (effects not seen).
4
Write-Ahead Logging (WAL)
Each log record has a unique Log Sequence Number.
LSNs are always increasing.
Each data page contains a pageLSN.
Log records flushed to disk
– It’s the LSN of the most recent log record
for a change done to that page.
The DBMS keeps track of the flushedLSN.
– This is the maximum LSN flushed, so far.
WAL protocol:
– Force the update log record to disk before
the corresponding data page gets to disk. pageLSN – Also, be sure to write all the log records
“Log tail” in RAM
for a transaction before it finishes the commit.
5
Log Record Fields:
Possible log record types:
update
End (signifies end of commit or abort)
records only
CLR (Compensation Log Record)
TxID
type prevLSN pageID offset
length before-image after-image
Update
Log Records
Commit
Abort
– for UNDO actions
6
Other Log-Related State
Transaction Table (TT) – TxID
– LastLSN
Dirty Page Table (DPT)
– One entry for each dirty page in the buffer pool
– Contains recLSN, i.e., the LSN of the log record which first caused the page to be dirty
7
Checkpoint
What is it?
– A periodic snapshot of the state of a DBMS, used to
reduce recovery time
begin_checkpoint record
end_checkpoint record contains current TT & DPT
– Other transactions continue to run; so, these tables are accurate only as of the time of the begin_checkpoint record.
– No attempt is made to force dirty pages to disk. Why not?
Store the LSN of the checkpoint record in a safe place (master record).
8
At Transaction Commit Time
Write the commit record to the log.
All log records up to the transaction’s lastLSN are flushed.
– This guarantees that flushedLSN lastLSN.
– Note that log flushes are sequential, synchronous writes to disk.
– There can be many log records per log page.
Write the end record to the log.
With ARIES, the written data pages (in the buffer pool) do not have to be written to disk at this time.
– Log records will allow us to recover, if necessary.
9
The Case of a Simple Transaction Abort (not a Crash Recovery)
“Play back” the log in reverse, UNDO’ing updates made by this transaction.
– Before starting UNDO, write an “abort” log record.
– Get the lastLSN of this transaction from the TT.
– Follow the chain of log records backwards via the prevLSN field, and undo the updates.
(continued on next slide)
10
Simple Transaction Abort (cont.)
Before restoring the old value of a page, write a CLR:
– Continue logging while you UNDO.
– CLR has one extra field: undoNextLSN
It points to the next LSN to undo (i.e., the prevLSN of the record
we’re currently undoing).
– CLRs are never undone (but they might be redone when repeating history).
At the end of UNDO, write an “end” log record.
Example:
210 | T77: abort
220 | CLR: undo T77, LSN 200
230 | CLR: undo T77, LSN 150
240 | T77: end (and remove it from the TT)
11
Smallest recLSN in Dirty Page Table after Analysis
3 phases:
Last checkpoint
– 3. UNDO the effects of failed transactions.
CRASH
Summary of ARIES Algorithm: Crash Recovery—The Big Picture
Oldest log record of transaction active at crash
Start from the last checkpoint (found via the master record).
ARU
12
– 1. Analysis: Which transactions were committed, and which had failed?
– 2. REDO all actions. i.e., repeat history
1. Analysis Phase
Reconstruct the state at the checkpoint by starting with the end checkpoint record.
– From that, we’ll rebuild the TT & DPT up to the time of the crash.
Scan the log forward from the checkpoint.
– For each “End” record: Remove that transaction from the TT.
– For each Update record: If page P is not in the DPT, then:
Add P to the DPT, and set its recLSN=LSN (i.e., the first LSN that
caused an update to this page). – For other records:
Add the transaction to the TT, set lastLSN=LSN, and change the transaction status upon commit.
13
2. Redo Phase
Reapply all updates (even those of aborted transactions), and redo CLRs.
Scan forward from the LSN containing smallest recLSN in the DPT. For each CLR or update log recLSN, REDO the action unless:
– The affected page is not in the Dirty Page Table, or
– The affected page is in the DPT, but has recLSN > LSN, or
– pageLSN (in DB) LSN
To REDO an action:
– Reapply the logged action.
– Set pageLSN to LSN. There’s no additional logging!
14
3. Undo Phase
ToUndo = { l | l is a lastLSN of a “loser” transaction}
While (ToUndo is not empty), do:
– Choose the largest LSN in the ToUndo list.
– If this LSN is a CLR and undoNextLSN == NULL:
Write an End record for this transaction
– Else if this LSN is a CLR, and undoNextLSN != NULL:
Add undoNextLSN to ToUndo
– Else LSN is an update; so, write a CLR, undo the update,
and
If prevLSN != NULL, add prevLSN to ToUndo; otherwise, write an End record.
15
Example of Recovery
LSN LOG
00 begin_checkpoint
05 end_checkpoint
10 update: T1 writes P5 20 update T2 writes P3 30 T1 abort
40 CLR: Undo T1 LSN 10 45 T1 End
50 update: T3 writes P1 60 update: T2 writes P5
prevLSNs
CRASH, RESTART
16
Example of Recovery (cont.)
LSN 00,05
LOG
40,45 50 60
80, 85
10 20 30
begin_checkpoint, end_checkpoint update: T1 writes P5
70
90, 95 100,105
update T2 writes P3
T1 abort
CLR: Undo T1 LSN 10, T1 End update: T3 writes P1
update: T2 writes P5
CRASH, RESTART
CLR: Undo T2 LSN 60
CLR: Undo T3 LSN 50; T3 End CLR: Undo T2 LSN 20; T2 End begin_checkpoint, end_checkpoint
undoNextLSN
17
DB2 and ARIES
DB2 uses the ARIES Algorithm.
Checkpoints do not require dirty pages to be
flushed.
– A background process flushes pages, periodically.
The most information ever recovered can go back well before the last few checkpoints.
– The system administrator or database administrator can change the frequency of checkpoints.
18
Oracle
Oracle flushes the log tail to disk as soon as the SCN (System Change Number or transaction) commits, but doesn’t write the changed data pages from the buffer pool until several sets of changed pages can be written together (or until a checkpoint occurs).
Checkpoints increase I/O because Oracle’s database writer flushes all dirty pages to disk.
The most information ever recovered is the amount of data between two checkpoints.
– There’s a trade-off in performance if the checkpoints occur too often.
– The system/database administrator can change the frequency of checkpoints.
19
Summary of ARIES/Crash Recovery
Log records and WAL are very important.
ARIES: Crash recovery works in 3 phases:
– Analysis: – Redo:
– Undo:
Forward from checkpoint Forward from oldest recLSN Backward from end to first LSN of oldest transaction alive at crash
Redo simply “repeats history”.
Upon Undo, write the CLRs.
Checkpoint: This is a quick way to limit the amount of the log to scan upon recovery.
20