CSE 4/562 - Database Systems

Relational Algebra Equivalences

CSE 4/562 – Database Systems

February 7, 2018

Recap: Relational Algebra

OperationSymMeaning
Selection$\sigma$Select a subset of the input rows
Projection$\pi$Delete unwanted columns
Cross-product$\times$Combine two relations
Set-difference$-$Tuples in Rel 1, but not Rel 2
Union$\cup$Tuples either in Rel 1 or in Rel 2
Intersection$\cap$Tuples in both Rel 1 and Rel 2
Join$\bowtie$Pairs of tuples matching a specified condition
Division$/$"Inverse" of cross-product

Division ($/$)

Not typically supported as a primitive operator,
but useful for expressing queries like:

Find species that appear in all boroughs

$$\pi_{BORONAME,\ SPC\_COMMON}(\textbf{Trees}) \;\;/\;\;\pi_{SPC\_COMMON}(\textbf{Trees})$$ (using set relational algebra)

$$R / S \equiv \{\; \left<\vec t\right> \;|\; \forall \left<\vec s\right> \in S, \left< \vec t \vec s \right> \in R \;\}$$

Division ($/$)

BORO SPC_COMMON
Brooklyn honeylocust
Brooklyn American linden
Brooklyn London planetree
Manhattan honeylocust
Manhattan American linden
Manhattan pin oak
Queens honeylocust
Queens American linden
Bronx honeylocust
/
SPC_COMMON
honeylocust
=
BORO
Brooklyn
Manhattan
Queens
Bronx
/
SPC_COMMON
honeylocust
American linden
=
BORO
Brooklyn
Manhattan
Queens
/
SPC_COMMON
honeylocust
American linden
pin oak
=
BORO
Manhattan

The running theme

If X and Y are equivalent and Y is better,
then replace all Xs with Ys

Today's focus: Provable Equivalence for RA Expressions

Equivalence

$$Q_1 = \pi_{A}\left( \sigma_{c}( R ) \right)$$ $$Q_2 = \sigma_{c}\left( \pi_{A}( R ) \right)$$
$$Q_1 \stackrel{?}{\equiv} Q_2$$

Ground Rules

Only Relational Values Matter
Obviously $Q_1 \neq Q_2$. What we care about is whether $Q_1(R) = Q_2(R)$...
Data Independent
... for all valid input data $R$.
However, it's fair to talk about equivalence when we know the data has some properties. (more on this later)
Data-Model Dependent
It's important to be clear whether we're talking about sets, bags, or lists.

In summary...

We say that $Q_1 \equiv Q_2$ if and only if
we can guarantee that the bag of tuples produced by $Q_1(R, S, T, \ldots)$
is the same as the bag of tuples produced by $Q_2(R, S, T, \ldots)$
for any combination of valid inputs $R, S, T, \ldots$.

... that satisfy any necessary properties.

Starting Rules

Selection
$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$ (Decomposability)
Projection
$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$ (Idempotence)
Cross Product
$R \times (S \times T) \equiv (R \times S) \times T$ (Associativity)
$R \times S \equiv S \times R$ (Commutativity)
Union
$R \cup (S \cup T) \equiv (R \cup S) \cup T$ (Associativity)
$R \cup S \equiv S \cup R$ (Commutativity)

Try it!

Show that $$R \times (S \times T) \equiv T \times (S \times R)$$

Show that $$\sigma_{c_1}(\sigma_{c_2}(R)) \equiv \sigma_{c_2}(\sigma_{c_1}(R))$$

Show that $$R \bowtie_{c} S \equiv S \bowtie_{c} R$$

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv \sigma_{R.A > 3}(R \bowtie_{B} S)$$

Rules for Multiple Operators

Selection + Projection
$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$ (Commutativity)

... but only if $A$ and $c$ are compatible

$A$ must include all columns referenced by $c$ ($cols(c)$)

Try it!

Show that $$\pi_A(\sigma_c(R)) \equiv \pi_A(\sigma_c(\pi_{(A \cup cols(c))}(R)))$$

Selection + Cross Product
$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$ (Commutativity)

... but only if $c$ references only columns of $R$

$cols(c) \subseteq cols(R)$

Try it!

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv (\sigma_{R.A > 3}(R)) \bowtie_{B} S$$

When is this rewrite a good idea?

Projection + Cross Product
$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$ (Commutativity)

... where $A_R$ and $A_S$ are the columns of $A$ from $R$ and $S$ respectively.

$A_R = A \cap cols(R)$     $A_S = A \cap cols(S)$

Try it!

Show that $$\pi_{A}(R \bowtie_c S) \equiv (\pi_{A_R}(R)) \bowtie_c (\pi_{A_S}(S))$$

When does this condition hold?

Intersection
$R \cap (S \cap T) \equiv (R \cap S) \cap T$ (Associativity)
$R \cap S \equiv S \cap R$ (Commutativity)
Selection +     
$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$ (Commutativity)
$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$ (Commutativity)
Projection +     
$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$ (Commutativity)
$\pi_A(R \cap S) \equiv (\pi_A(R)) \cap (\pi_A(R))$ (Commutativity)
Cross Product + Union
$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$ (Distributivity)

Example


    SELECT R.A, T.E

      FROM R, S, T

     WHERE R.B = S.B
       AND S.C < 5
       AND S.D = T.D
          

General Query Optimizers

Input: Dumb translation of SQL to RA

⬇︎

Apply rewrites

⬇︎

Output: Better, but equivalent query

Which rewrite rules should we apply?

Selection Pushdown
Always commute Selections as close to the leaves as possible.
Join Construction
Joins are always better than cross-products.
(Optional) Projection Pushdown
Commuting Projections down to the leaves removes redundant columns, and may be beneficial for some systems.
Join Algorithm Selection
Joins can be implemented differently, depending on the join predicate.
Join/Union Ordering
The order in which joins are evaluated may affect query runtimes.
Access Paths
$(\sigma_c(R))$ and $(Q(\ldots) \bowtie_c R)$ are special cases that we can make fast!

Some rewrites are situational... we need more information to decide when to apply them.

General Query Optimizers

  1. Apply blind heuristics (e.g., push down selections)
  2. Enumerate all possible execution plans by varying (or for a reasonable subset)
    • Join/Union Evaluation Order (commutativity, associativity, distributivity)
    • Algorithms for Joins, Aggregates, Sort, Distinct, and others
    • Data Access Paths
  3. Estimate the cost of each execution plan
  4. Pick the execution plan with the lowest cost
Next Class: Extended Relational Algebra and Basic Join Algorithms