May 6, 2021
$\Delta Q$ | (ideally) Small & fast query |
$+$ | (ideally) Fast "merge" operation |
$\sigma(\mathcal R) \rightarrow \sigma(\mathcal R \uplus \Delta \mathcal R)$
$ \equiv $ $\sigma(\mathcal R)$ $ \uplus $ $\sigma(\Delta \mathcal R)$
$Q(\mathcal D) = \sigma(\mathcal R)$
$\Delta Q(\mathcal D, \Delta \mathcal D) = \sigma(\Delta \mathcal R)$
Set/Bag difference also commutes through selection
$\pi(\mathcal R) \rightarrow \pi(\mathcal R \uplus \Delta \mathcal R)$
$ \equiv $ $\pi(\mathcal R)$ $ \uplus $ $\pi(\Delta \mathcal R)$
$Q(\mathcal D) = \pi(\mathcal R)$
$\Delta Q(\mathcal D, \Delta \mathcal D) = \pi(\Delta \mathcal R)$
Does this work under set semantics?
$\mathcal R_1 \uplus \mathcal R_2 \rightarrow \mathcal R_1 \uplus \Delta \mathcal R_1 \uplus \mathcal R_2 \uplus \Delta \mathcal R_2$
$ \equiv $ $\mathcal R_1 \uplus \mathcal R_2$ $ \uplus $ $\Delta \mathcal R_1 \uplus \Delta \mathcal R_2$
$Q(\mathcal D) = \mathcal R_1 \uplus \mathcal R_2$
$\Delta Q(\mathcal D, \Delta \mathcal D) = \Delta \mathcal R_1 \uplus \Delta \mathcal R_2$
A "batch" of operations that should execute together
Alice and Bob submit transactions at the same time!
Two schedules are conflict equivalent if there is a sequence of pairwise "flips" (of reads, or operations on different objects) that gets you from one schedule to the other.
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
↑ |
| | ↓ | W(A) |
↓ | W(A) |
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | W(A) |
|
| | R(B) |
|
↓ | W(A) |
Conflict equivalent to a serial schedule!
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
Can't rewrite!
A schedule is conflict serializable if it is conflict equivalent to a serial schedule.
How do we determine if a schedule is conflict-serializable?
Time | T1 | T2 |
---|---|---|
| | W(B) |
|
| | R(B) |
|
| | W(A) |
|
↓ | W(A) |
T2's write to B "happens before" T1's read
T1's write to A "happens before" T2's write
Cycle! No equivalent serial schedule!
An acyclic "Happens Before" or Dependency Graph is conflict serializable.
Create one lock for each object.
Each transaction operates in two "phases".
In practice, the release phase happens all at once at the end
$2PL \subset CS$
Pick a serial order (e.g., the order in which transactions reach the validation phase)
Make sure the transaction's operations follow this order
... but Snapshot Isolation only checks for equivalence to ONE serial schedule. There might be a different, conflict-equivalent serial schedule.
$SI \subset CS$
Each object $A$ gets a read timestamp ($RTS(A)$) and a write timestamp ($WTS(A)$)
Each transaction $\mathcal T$ gets a timestamp ($TS(\mathcal T)$).
(note that these can be logical timestamps like sequence numbers)
(also note that real DBs don't use read timestamps... which creates problems)
Two schedules are view-equivalent when you can transform one into the other by reordering any pair of operations that...
A schedule is view serializable if it is view-equivalent to some serial schedule
Timestamp concurrency control is guaranteed to produce view-serializable schedules.
On the happens-before graph, throw away edges created by "hidden" write-write conflicts.
If the resulting graph is acyclic, the schedule is view serializable
$CS \subset VS$
$2PL, SI \subset CS \subset VS \subset S$
What if the DB fails during a write.
IOs aren't atomic
Atomicity and Durability might be violated!
What if we need to page out some pages modified by a live transaction?
If the transaction aborts, the page state needs to be reverted.
Atomicity might be violated
Idea: Periodically mark down the index of the earliest log entry still needed
COMMIT
entry.COMMIT
ed transactions.Timestamp | Transaction | Object | Value | Prev |
---|---|---|---|---|
10 | T1 | Page 5 | 1010... | 00101... |
11 | T2 | Page 3 | 1000... | 0111... |
12 | T1 | Page 1 | 0011... | 0001... |
13 | T3 | Page 5 | 1100... | 1010 |
Idea: Record the page's previous value.
[head] :- [body]
$$Q(A) :-~~ R(A, B), S(B, C)$$
like SELECT A FROM R NATURAL JOIN S
Stop thinking about relations as collections of records, and instead think of them as collections of facts
R | A | B |
---|---|---|
1 | 1 | 2 |
2 | 1 | 3 |
3 | 2 | 3 |
4 | 2 | 4 |
The fact $R(1, 2)$ is true.
The fact $R(2, 1)$ is false (or unknown).
A table contains all facts that are provably true.
$$Q(A) :-~~ R(A, B), S(B, C)$$
For any $A$, the fact $Q(A)$ is true if...$\forall A : \big( \exists B, C : R(A, B) \wedge S(B, C) \big) \rightarrow Q(A)$
$$Q(A) :-~~ R(A, B), S(B, C)$$ $$Q(A) :-~~ R(A, B), R(B, C)$$
Treat multiple rules as a disjunction.
($Q(A)$ is true if any rule is satisfied)
$[[ A > B ]]$ | A | B |
---|---|---|
1 | 0 | |
2 | 0 | |
3 | 0 | |
... | ||
2 | 1 | |
... |
Relations are Sets of Facts. We can have a relation consisting of all pairs $A, B$ where $A$ is bigger.
How does the input data relate to a query output.
Idea: Arithmetic models how a tuple was derived pretty well.
$[[ R(a, b) ]] \rightarrow \texttt{constant}$
$[[R(a, b) \cup S(a, b)]] \rightarrow [[R(a, b)]] \oplus [[S(a, b)]] $
$[[R(a, b) \times S(c, d)]] \rightarrow [[R(a, b)]] \otimes [[S(c, d)]] $
$[[\pi_a R(a, b)]] \rightarrow \sum_b [[R(a, b)]]]] $
$\oplus$ | $\otimes$ | Effect |
---|---|---|
$+$ | $\times$ | Bag multiplicity |
$\vee$ | $\wedge$ | Set existence |
$\cup$ | $\times$ | Why provenance |
min | max | Access control |