- Logging

Logging

April 20, 2021

Garcia-Molina/Ullman/Widom: Ch. 17.1-17.5

Atomicity: A transaction is either applied fully (COMMITed) or not at all (ABORTed).
Consistency: Constraints are enforced on the final state of the transaction (and the transaction is ABORTed if they fail).
Isolation: Two transactions running in parallel don't interfere with each other
Durability: Once the database returns from a COMMIT successfully, the data is safe from marmots

2 Problem Scenarios

A long write...

What if the DB fails during a write.

IOs aren't atomic

Atomicity and Durability might be violated!

Buffer memory is limited...

What if we need to page out some pages modified by a live transaction?

If the transaction aborts, the page state needs to be reverted.

Atomicity might be violated

Simplifying Assumptions

Today: objects are pages: This lets us reason about operations at the IO level
IOs are individually atomic and durable.: Successful writes can be distinguished from fails.
HDD/SSDs are a "safe" storage layer.: No fires, t-rex attacks, or global pandemics*

When the system reboots/the power comes back on, we assume that successful writes are still there and failed writes are not.

* May not be an entirely safe assumption

Failure Analysis

All IOs fail (crash before): No problem!
All IOs succeed (crash after): No problem!
At least one IO succeeds, and at least one fails: Problem! (partial write)

Write-Ahead Logging (WAL)

"Log" the transaction's actions
Wait for the log to be safely on-disk
Write the transaction's effects to disk

Log

W(A: 10)
W(C: 8)
W(E: 9)
COMMIT

Page	Value
A	8	10
B	12
C	5	8
D	18
E	16	9

Failure Analysis

All IOs fail or all succeed: No problem!
At least one Log IO succeeds, but failure before COMMIT.: No problem! DB intact. No COMMIT = don't replay the log.
COMMIT succeeds, no DB writes succeed: No problem. The log lets us replay transaction's effects.
COMMIT succeeds, at least one DB write succeeds: Possible problem. Log replay needs to be idempotent.
COMMIT succeeds, all DB writes succeed: Slow! Still might need to replay the log.

More Failure Analysis

Second failure during replay: Not a problem if log operations are idempotent!; ... but could still cause problems if recovery is slow.

Note: The transaction is durable as soon as we can guarantee the transaction's COMMIT is in the log.

We can consider the commit successful even if the data pages aren't on disk yet.

Log

W(A: 10)
W(C: 8)
W(E: 9)
COMMIT

Page	Buffer		Disk
A	10	10	10
B			12
C	5	8	5
D			18
E	9	16	9

Multiple Transactions

Using only one log is more efficient for writes!

... but recovery is slower.

Recovery

Scan through the log to find all transactions with a COMMIT entry.
Replay log entries from COMMITed transactions only.

Recovery gets slower with every transaction.

Idea: Periodically mark down the index of the earliest log entry still needed

How do we decide that a log entry is no longer needed.

Buffer Manager

Page	Status	First Log Entry	Data
24	DIRTY	47	01011010...
30	CLEAN	n/a	11001101...
52	DIRTY	107	10100010...
57	DIRTY	87	01001101...
66	CLEAN	n/a	01001011...

Keep the first log entry to modify any dirty page

Log entries lower than a page's 'first log entry' aren't useful.

Periodically write the lowest 'first log entry' into the log (CHECKPOINT).

WAL Recovery

Scan backwards through the log to find the checkpoint.
Scan forward starting with the checkpointed 'first log entry' to find all transactions with a COMMIT entry.
Replay log entries starting with the checkpointed 'first log entry', ignoring non-COMMITed transactions.

Multi-Write Atomicity: Solved by WAL
Pro-actively Flushing Writes

Problem: There may not be enough memory to store pages modified by uncommitted transactions.

Keep a separate 'uncommitted' copy of the db.: Need to copy data back to main db after commit.
Use the WAL to recover uncommitted pages.: Sloooooooow.
Flush to the DB itself.: What if the transaction aborts?

Undo-Logging

Log

Timestamp	Transaction	Object	Value	Prev
10	T1	Page 5	1010...	00101...
11	T2	Page 3	1000...	0111...
12	T1	Page 1	0011...	0001...
13	T3	Page 5	1100...	1010

Idea: Record the page's previous value.

If a transaction aborts, go backwards through the log and undo all of the changes

TS	Xact	Object	Value	Prev
10	T1	Page 5	1010...	0101...
11	T2	Page 3	1000...	0111...
12	T1	Page 1	0011...	0001...
13	T3	Page 5	1100...	1010...

Page	Disk
1	0011
2	1100
3	1000 0111
4	1001
5	1100 1010 0101

Observation: Going backwards through the log can be slow if you have a lot of concurrent transactions.

Log

Timestamp	Transaction	Object	Value	Prev	Last TS
10	T1	Page 5	1010...	0101...	7
11	T2	Page 3	1000...	0111...	3
12	T1	Page 1	0011...	0001...	10
13	T3	Page 5	1100...	1010...	8

Idea: Each log entry records the timestamp of the next most recent log entry for the same transaction.

Log entries form a linked list for each transaction

Putting it together

Transaction Table

Transaction ID	State	Last Log Entry
T24	VALIDATING	99
T38	COMMITTING	85
T42	ABORTING	87
T56	ACTIVE	100

Buffer Manager

Page	Status	First Update	Data
24	DIRTY	47	01011010...
30	CLEAN	n/a	11001101...
52	DIRTY	107	10100010...
57	DIRTY	87	01001101...
66	CLEAN	n/a	01001011...

Types of Log Entries

Update Page: As above
ABORT (Start aborting)
COMMIT (Transaction committed)
END (Done aborting/committing)
CHECKPOINT (Snapshot of volatile memory state)

Recovering after a crash

Rebuild in-memory state (Analyze)
Rebuild buffer manager (Redo)
Abort uncommitted transactions (Undo)

ARIES Recovery

Timestamp	Entry
0	CHECKPOINT
1	T1: UPDATE P5
2	T2: UPDATE P3
3	T1: ABORT
4	T2: END
5	T3: WRITE P1
6	T2: WRITE P5
CRASH

Analyze

	Timestamp	Entry
→	0	CHECKPOINT
→	1	T1: UPDATE P5
→	2	T2: UPDATE P3
→	3	T1: ABORT
→	4	T1: END
→	5	T3: WRITE P1
→	6	T2: WRITE P5
	CRASH

Xact	Status	LastTS
T1	ACTIVE ABORTING	0 1
T2	ACTIVE	0 2 6
T3	ACTIVE	5

Redo

As WAL recovery, before

Undo

Start ABORTing all 'ACTIVE' transactions
Replay transaction log in reverse (correct behavior ensured by serializability)

Problem: Writing out a Checkpoint is SLOOOW (and blocking).

Idea: Snapshot in-memory state, then write in the background.

Record "START_CHECKPOINT" in log when the snapshot is taken.

Record "END_CHECKPOINT" in log when the snapshot is fully written.

Find the last "END_CHECKPOINT" and start recovery from the corresponding "START_CHECKPOINT" entry

Optimization: What if you crash during recovery?

Idea: Log "UNDO" operations as new writes

Compensation Log Records (CLRs)

Timestamp	Entry
0	CHECKPOINT
1	T1: UPDATE P5
2	T2: UPDATE P3
3	T1: ABORT
4	T1: CLR Restore P5
5	T2: END
6	T3: WRITE P1
7	T2: WRITE P5

Analyze

	Timestamp	Entry
→	0	CHECKPOINT
→	1	T1: UPDATE P5
→	2	T2: UPDATE P3
→	3	T1: ABORT
→	4	T1: CLR P5
→	5	T1: END
→	6	T3: WRITE P1
→	7	T2: WRITE P5
	CRASH

Xact	Status	LastTS
T1	ACTIVE ABORTING	0 1 0
T2	ACTIVE	0 2 6
T3	ACTIVE	5

Next Class Updates in Distributed Databases