April 6, 2021
If X and Y are equivalent and Y is better,
then replace all Xs with Ys
Today's focus: Updates
INSERT INTO Trees (location, species)
VALUES ('123 A Street', 'pin oak');
DELETE FROM Trees WHERE date > now();
UPDATE Trees SET species = 'pin oak'
WHERE species = 'pinoak';
# | ... | species |
---|---|---|
1 | ... | pin oak |
2 | ... | pinoak |
3 | ... | pinoak |
4 | ... | pin oak |
... |
UPDATE Trees SET species = 'pin oak'
WHERE species = 'pinoak';
Null Pointer Exception!
T-Rex Attack!
User-abort!
# | ... | species |
---|---|---|
1 | ... | pin oak |
2 | ... | pin oak |
3 | ... | pinoak |
4 | ... | pin oak |
... |
Bob.Balance -= 20
Alice.Balance += 20
Bob's balance shouldn't drop below $0
If Alice and Bob both update the data at the same time,
nothing should break.
Once a write is confirmed, it should survive a power outage, crash, or marmot attack.
Each use case may require different properties
What if I want correctness over more than one operation?
START TRANSACTION
UPDATE Ledger SET Balance = Balance - 20
WHERE name = 'Bob'
UPDATE Ledger SET Balance = Balance + 20
WHERE name = 'Alice'
COMMIT
A "batch" of operations that should execute together
Object
... a database "thing"
Each database is different,
so let's treat objects abstractly for now.
Observation: If two clients modify different objects, they can't possibly interfere with each other.
... but I'm getting ahead of myself
A sequence of reads from and writes to objects.
... followed by a COMMIT or ABORT
What does it mean for a transaction to be Isolated?
Alice and Bob submit transactions at the same time!
def T1:
A = A + 100
B = B - 100
def T2:
A = A * 1.06
B = B * 1.06
Time | T1 | T2 |
---|---|---|
↓ | A = A + 100 |
|
↓ | B = B - 100 |
|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
A gets an extra $6
Time | T1 | T2 |
---|---|---|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
|
↓ | A = A + 100 |
|
↓ | B = B - 100 |
B gets an extra $6
Time | T1 | T2 |
---|---|---|
↓ | A = A + 100 |
|
↓ | A = A * 1.06 |
|
↓ | B = B * 1.06 |
|
↓ | B = B - 100 |
A and B both get an extra $6
Idea: Don't allow updates in parallel!
Observation Blocking transactions is sloooow
Less Bad Idea: Create/Enforce the illusion that we're not allowing updates in parallel.
A sequence of Reads and Writes
Time | T1 | T2 | |
---|---|---|---|
↓ | R(A) |
||
↓ | W(A) A = A * 1.06 |
||
↓ | R(B) |
||
↓ | W(B) B = B + 1/06 |
||
↓ | R(A) |
||
↓ | W(A) A = A + 100 |
||
↓ | R(B) |
||
↓ | W(B) B = B - 100 |
Question: How do we ensure that transactions are only ever executed with serializable schedules?
Problem: We can't know if a schedule will be serializable until the transaction is done
Can we convince ourselves that these concurrency strategies are guaranteed to produce serializable schedules?
Observation 1: Reads can never interfere with reads
You can reverse the order of two reads without changing the serializability of a schedule.
Observation 2: Operations on different objects can't interfere with each other.
You can reverse the order of two operations on different objects without changing the serializability of a schedule.
Two schedules are conflict equivalent if there is a sequence of pairwise "flips" (of reads, or operations on different objects) that gets you from one schedule to the other.
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
↑ |
| | ↓ | W(A) |
↓ | W(A) |
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | W(A) |
|
| | R(B) |
|
↓ | W(A) |
Conflict equivalent to a serial schedule!
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
Can't rewrite!
A schedule is conflict serializable if it is conflict equivalent to a serial schedule.
How do we determine if a schedule is conflict-serializable?
Observation: We can't reorder Write/Write (or Read/Write) operations on the same object.
Two transactions accessing the same object create a partial order on the serial schedule
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
T2's write to B "happens before" T1's read
T2's write to A "happens before" T1's write
No cycles in the partial order!
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
T2's write to B "happens before" T1's read
T1's write to A "happens before" T2's write
Cycle! No equivalent serial schedule!
Time | T1 | T2 | T3 |
---|---|---|---|
| | W(A) |
||
| | W(A) |
||
| | W(B) |
||
| | W(B) |
||
↓ | W(B) |
T1's write to A "happens before" T2's write
T1, T2's writes to B "happen before" T3's write
T2's write to B "happens before" T1's write
Cycle! No equivalent serial schedule!
Time | T1 | T2 | T3 |
---|---|---|---|
| | W(A) |
||
| | W(A) |
||
| | R(B) |
||
| | R(B) |
||
↓ | W(B) |
T1's write to A "happens before" T2's write
T1, T2's read on B "happens before" T3's write
T2's read on B "happens before" T1's read (but it doesn't count)
No Cycle! Has an equivalent serial schedule!
Time | T1 | T2 | T3 |
---|---|---|---|
| | W(A) |
||
| | W(A) |
||
| | R(B) |
||
| | W(B) |
||
↓ | R(B) |
T1's write to A "happens before" T2's write
T2's read on B "happens before" T3's write
T3's write to B "happens before" T1's read
Cycle! No equivalent serial schedule!
Create one node per transaction
For every W/W, W/R, or R/W pair of operations on the same object draw an edge from the first operation to the second.
An acyclic "Happens Before" or Dependency Graph is conflict serializable.
All three approaches simplify the problem by unilaterally picking an execution order and mandating that the schedule conforms to it.
We might get false positives: schedules that would ultimately end up being conflict equivalent to some schedule, but aren't equivalent to the one we picked (arbitrarily).
Pessimistic Concurrency Control