February 11, 2021
If X and Y are equivalent and Y is better,
then replace all Xs with Ys
Today's focus: Provable Equivalence for RA Expressions
We say that $Q_1 \equiv Q_2$ if and only if
we can guarantee that the bag of tuples produced by $Q_1(R, S, T, \ldots)$
is the same as the bag of tuples produced by $Q_2(R, S, T, \ldots)$
for any combination of valid inputs $R, S, T, \ldots$.
... that satisfy any necessary properties.
Selection | |
---|---|
$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$ | (Decomposability) |
Projection | |
$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$ | (Idempotence) |
Cross Product | |
$R \times (S \times T) \equiv (R \times S) \times T$ | (Associativity) |
$R \times S \equiv S \times R$ | (Commutativity) |
Union | |
$R \cup (S \cup T) \equiv (R \cup S) \cup T$ | (Associativity) |
$R \cup S \equiv S \cup R$ | (Commutativity) |
Show that $$R \times (S \times T) \equiv T \times (S \times R)$$
Show that $$\sigma_{c_1}(\sigma_{c_2}(R)) \equiv \sigma_{c_2}(\sigma_{c_1}(R))$$
Show that $$R \bowtie_{c} S \equiv S \bowtie_{c} R$$
Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv \sigma_{R.A > 3}(R \bowtie_{B} S)$$
Selection + Projection | |
---|---|
$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$ | (Commutativity) |
... but only if $A$ and $c$ are compatible
$A$ must include all columns referenced by $c$ ($cols(c)$)
Show that $$\pi_A(\sigma_c(R)) \equiv \pi_A(\sigma_c(\pi_{(A \cup cols(c))}(R)))$$
Selection + Cross Product | |
---|---|
$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$ | (Commutativity) |
... but only if $c$ references only columns of $R$
$cols(c) \subseteq cols(R)$
Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv (\sigma_{R.A > 3}(R)) \bowtie_{B} S$$
Projection + Cross Product | |
---|---|
$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$ | (Commutativity) |
... where $A_R$ and $A_S$ are the columns of $A$ from $R$ and $S$ respectively.
$A_R = A \cap cols(R)$ $A_S = A \cap cols(S)$
Show that $$\pi_{A}(R \bowtie_c S) \equiv (\pi_{A_R}(R)) \bowtie_c (\pi_{A_S}(S))$$
Intersection | |
---|---|
$R \cap (S \cap T) \equiv (R \cap S) \cap T$ | (Associativity) |
$R \cap S \equiv S \cap R$ | (Commutativity) |
Selection + | |
$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$ | (Commutativity) |
$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$ | (Commutativity) |
Projection + Union | |
$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$ | (Commutativity) |
Cross Product + Union | |
$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$ | (Distributivity) |
SELECT R.A, T.E
FROM R, S, T
WHERE R.B = S.B
AND S.C < 5
AND S.D = T.D
➔
Input: Dumb translation of SQL to RA
⬇︎
Apply rewrites
⬇︎
Output: Better, but equivalent query
Which rewrite rules should we apply?
Some rewrites are situational... we need more information to decide when to apply them.
(note: $c$ is always compatible in this direction)
plan.transform {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
match/case lets you find patterns.
transform lets you apply rewrite rules.
(Slight oversimplification since Spark uses extended relational algebra)
What happens if I apply this rewrite to:
Filter(condition, Project(columns1, Project(columns2, child)))
↕
$$\sigma_c(\pi_{A_1}(\pi_{A_2}(R)))$$
⇓ $$\pi_{A_1}(\sigma_c(\pi_{A_2}(R)))$$
var last = null
while( ! plan.equals(last) ){
last = plan
plan = plan.transform { ... }
}
Repeat until we reach a "fixed-point"
plan.transformDown {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
transformUp: Require bottom-up tree traversal.
transformDown: Require top-down tree traversal.
plan.transformDown {
case Filter(condition, Union(children, /* other goop */)) =>
Union(
children.map { child =>
Filter(condition, child)
},
/* other goop */
)
}