CSE 4/562 - Database Systems

Relational Algebra

CSE 4/562 – Database Systems

February 5, 2018

The running theme...

Replace [Thing A] with better, but equivalent [Thing B].

  1. How can we tell if [Thing B] is equivalent to [Thing A]?
  2. How can we tell if [Thing B] is better than [Thing A]?

Replace [Query A] with better, but equivalent [Query B].

  1. How can we tell if [Query B] is equivalent to [Query A]?
  2. How can we tell if [Query B] is better than [Query A]?

... but first a few definitions.

Relational Data

Relation (Table)
A collection of Tuples of Values
All tuples have the same set of attributes, or schema

Relational Data

Types of relations...

Set

SPC_COMMONBORONAME
'honeylocust''Brooklyn'
'American linden''Brooklyn'
'London planetree''Brooklyn'
  
  

(Unique Only)

Bag

SPC_COMMONBORONAME
'honeylocust''Brooklyn'
'London planetree''Brooklyn'
'American linden''Brooklyn'
'London planetree''Brooklyn'
'honeylocust''Brooklyn'

List

SPC_COMMONBORONAME
'American linden''Brooklyn'
'honeylocust''Brooklyn'
'honeylocust''Brooklyn'
'London planetree''Brooklyn'
'London planetree''Brooklyn'

(Order Matters)

Declarative Languages

DeclarativeImperative
Say what you want Say how you want to get it
"Get me the TPS Reports"
Look at every T report 
For each week:
  Sum up the sprocket count
  Find that week's S report
  ... 
SQL, RA, R, … C, Scala, Java, Python, …

Declarative languages make it easier to explore equivalent computations to find the best one.

Declarative Queries

SQL
Readability: Focus on what and where from
Relational Algebra
Data Flow: Focus on transformations
Relational Calculus
Minimality: Focus on attribute relationships

SQL


                      SELECT R.A FROM R, S 
                      WHERE R.B = S.B AND S.C = 10
          

Relational Algebra

$$\pi_{R.A}(\sigma_{S.C=10}(\bowtie_{R.B = S.B}(\textbf{R}, \textbf{S})))$$

Relational Calculus

$$\exists B : R(A, B), S(B, 10)$$

Declarative Queries

SQL
Readability: Focus on what and where from
Relational Algebra
Data Flow: Focus on transformations
Relational Calculus
Minimality: Focus on attribute relationships

Preliminaries

We start with a database instance with a fixed schema

Queries are applied to Relations $$Q(\textbf{Trees}, \textbf{SpeciesInfo})$$

Queries are also Relations! $$Q_2(\textbf{SpeciesInfo}, Q_1(\textbf{Trees}))$$ (Relational Algebra is Closed)

Relational Algebra

OperationSymMeaning
Selection$\sigma$Select a subset of the input rows
Projection$\pi$Delete unwanted columns
Cross-product$\times$Combine two relations
Set-difference$-$Tuples in Rel 1, but not Rel 2
Union$\cup$Tuples either in Rel 1 or in Rel 2

Also: Intersection, Join, Division, Renaming

(Not essential, but can be useful)

Input Query Language Output
Sets of Tuples $\rightarrow$ Set Relational Algebra $\rightarrow$ Set of Tuples
Bags of Tuples $\rightarrow$ Bag Relational Algebra $\rightarrow$ Bag of Tuples
Lists of Tuples $\rightarrow$ Extended Relational Algebra $\rightarrow$ List of Tuples

First we focus on sets and bags.

Selection ($\sigma_{c}$)

Delete rows that fail the condition $c$.

$$\sigma_{(BORONAME = \texttt{'Brooklyn'})} \textbf{Trees}$$
TREE_IDSPC_COMMONBORONAME...
204026'honeylocust''Brooklyn'...
204337'honeylocust''Brooklyn'...
189565'American linden''Brooklyn'...
192755'London planetree''Brooklyn'...
189465'London planetree''Brooklyn'...
... and 177287 more

Projection ($\pi_{A}$)

Delete attributes not in the projection list $A$.

$$\pi_{BORONAME}(Trees)$$
BORONAME
Queens
Brooklyn
Manhattan
Bronx
Staten Island

Only 5 results... not 683788?

Set and Bag Projection are different

Reminder: Queries are Relations

What are these queries schemas?

$$\pi_{TREEID, SPC\_COMMON, BORONAME} \textbf{Trees}$$
$$\sigma_{(BORONAME = \texttt{'Brooklyn'})} \textbf{Trees}$$
$$\sigma_{(BORONAME = \texttt{'Brooklyn'})}(\pi_{TREEID, SPC\_COMMON, BORONAME} \textbf{Trees})$$

Union ($\cup$)

Takes two relations that are union-compatible...

(Both relations have the same number of fields with the same types)

... and returns all tuples appearing in either relation

$$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} \textbf{Trees}) \cup (\sigma_{(BORONAME=\texttt{'Manhattan'})} \textbf{Trees})$$

We use $\uplus$ if we explicitly mean bag union

Intersection ($\cap$)

Return all tuples appearing in both
of two union-compatible relations

$$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} (\pi_{SPC\_COMMON} \textbf{Trees})) \\ ~~~~~~~~~\cap (\sigma_{(BORONAME=\texttt{'Manhattan'})} (\pi_{SPC\_COMMON} \textbf{Trees}))$$

What is this query asking?

Set Difference

Return all tuples appearing in the first, but not the second
of two union-compatible relations

$$(\sigma_{(BORONAME=\texttt{'Brooklyn'})} (\pi_{SPC\_COMMON} \textbf{Trees})) \\ ~~~~~~~~~- (\sigma_{(BORONAME=\texttt{'Manhattan'})} (\pi_{SPC\_COMMON} \textbf{Trees}))$$

What is this query asking?

Union, Intersection, Set Difference

What is the schema of the result of any of these operators?

Cross (Cartesian) Product ($\times$)

Create all pairs of tuples.

$$\pi_{SPC\_COMMON, BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON, AVG\_HEIGHT} (\textbf{TreeInfo})$$
SPC_COMMONAVG_HEIGHT
cedar elm60
lacebark elm45
... and more
SPC_COMMONBORONAMESPC_COMMONAVG_HEIGHT
honeylocustBrooklyncedar elm60
honeylocustBrooklyncedar elm60
American lindenBrooklyncedar elm60
London planetreeManhattancedar elm60
London planetreeManhattancedar elm60
...
honeylocustBrooklynlacebark elm45
honeylocustBrooklynlacebark elm45
American lindenBrooklynlacebark elm45
London planetreeManhattanlacebark elm45
London planetreeManhattanlacebark elm45
... and more

Cross (Cartesian) Product ($\times$)

$$\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})$$

What is the schema of the resulting relation?

The relation has a naming conflict
(two attributes with the same name)

Renaming ($\rho$)

$$\rho_{TNAME,\ BORO,\ INAME,\ HEIGHT}\left( \pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})\right)$$

What is the schema of the resulting relation?

When writing cross-products on the board,
I will use implicit renaming

Join ($\bowtie_c$)

Pair tuples according to a condition c.

$$\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \bowtie_{T.SPC\_COMMON = TI.SPC\_COMMON} \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})$$
Identical to... $$\sigma_{T.SPC\_COMMON = TI.SPC\_COMMON}\left(\pi_{SPC\_COMMON,\ BORONAME} (\textbf{Trees}) \times \pi_{SPC\_COMMON,\ AVG\_HEIGHT} (\textbf{TreeInfo})\right)$$
$$R \bowtie_c S \equiv \sigma_c(R \times S)$$

Join Shorthands

Equi-joins are joins with only equality tests in the condition.

Join on attribute(s)
$R \bowtie_{A} S \equiv R \bowtie_{R.A = S.A} S$
Same values on the listed attributes
Natural Join
$R \bowtie S \equiv R \bowtie_{attrs(R) \cap attrs(S)} S$
Same values on all shared attributes

Which operators can create duplicates?

(Which operators behave differently in Set- and Bag-RA?)

Operator Symbol Duplicates?
Selection $\sigma$ No
Projection $\pi$ Yes
Cross-product $\times$ No
Set-difference$-$ No
Union $\cup$ Yes
Join $\bowtie$No

Group Work

Find the BORONAMEs of all boroughs that do have trees with an average height of below 45 inches

SPC_COMMONAVG_HEIGHT
cedar elm60
lacebark elm45
... and more
SPC_COMMONBORONAME
'honeylocust''Brooklyn'
'honeylocust''Brooklyn'
'American linden''Brooklyn'
'London planetree''Manhattan'
'London planetree''Manhattan'
... and more
$$\pi_{BORONAME}(\sigma_{AVG\_HEIGHT < 45}(\textbf{Trees}\bowtie\textbf{TreeInfo}))$$
$$\pi_{BORONAME}(\textbf{Trees}\bowtie\sigma_{AVG\_HEIGHT < 45}(\textbf{TreeInfo}))$$

Division ($/$)

Not typically supported as a primitive operator,
but useful for expressing queries like:

Find species that appear in all boroughs

$$\pi_{BORONAME,\ SPC\_COMMON}(\textbf{Trees}) \;\;/\;\;\pi_{SPC\_COMMON}(\textbf{Trees})$$ (using set relational algebra)

$$R / S \equiv \{\; \left<\vec t\right> \;|\; \forall \left<\vec s\right> \in S, \left< \vec t \vec s \right> \in R \;\}$$

BORO SPC_COMMON
Brooklyn honeylocust
Brooklyn American linden
Brooklyn London planetree
Manhattan honeylocust
Manhattan American linden
Manhattan pin oak
Queens honeylocust
Queens American linden
Bronx honeylocust
/ { honeylocust } = Brooklyn, Manhattan, Queens, Bronx
/ { honeylocust, American linden } = Brooklyn, Manhattan, Queens
/ { honeylocust, American linden, pin oak }= Manhattan

Group Work

If time permits: Implement division using other operators.

Relational Algebra

A simple way to think about and work with
computations over collections.

… simple → easy to evaluate

… simple → easy to optimize

Next time, Optimizing RA