CSE 4/562 - Database Systems

Extended RA

CSE 4/562 – Database Systems

February 12, 2018

Extended Relational Algebra

Set/Bag Operations
Select ($\sigma$), Project ($\pi$), Join ($\bowtie$), Union ($\cup$)
Bag Operations
Distinct ($\delta$), Outer Joins (⟗)
List Operations
Sort ($\tau$), Limit
Arithmetic Operations
Extended Projection ($\pi$), Aggregation ($\sigma$), Grouping ($\gamma$)

Extended Projection

Like normal projection, but can create new columns

$\pi_{M \leftarrow A+B*C,\; N \leftarrow 2}(R)$

produces 1 row for every row of R, with 2 columns: M and N

Outer Join

... but first

NULL Values

  • Field values can be unknown or inapplicable.
    • A tree with an unknown species.
    • The street of a tree in a park.
    • The '.' on many of your ID cards
  • SQL provides a special NULL value for these cases

NULL makes things more complicated.

$$\textbf{Trees.SPC_COMMON} = \texttt{'Brooklyn'}$$

What happens if Trees.SPC_COMMON is NULL?

$$\texttt{NULL} = \textbf{'Brooklyn'} \equiv \textbf{Unknown}$$

WHERE clauses eliminate all non-True rows

$$Streets \bowtie_{StreetName} Trees$$

What happens if some streets have no trees?

Outer Join (⟗, ⟕, ⟖)

  1. Include all results from the normal (inner) join.
  2. Also include rows that don't get joined.

Outer Join

Inner Join
Normal, plain, simple join
Left Outer Join (⟕)
Include un-joined rows from the left hand side
Right Outer Join (⟖)
Include un-joined rows from the right hand side
[Full] Outer Join (⟗)
Include un-joined rows from either side

Sort / Limit

$$\tau_{A}(R)$$ The tuples of $R$ in ascending order according to 'A'

$$\textbf{L}_{n}(R)$$ The first $n$ tuples of R

(Typically combined with sort. If not, pick arbitrarily.)


Pick your favorite sort algorithm.

What happens if you don't have enough memory?

Key Idea: Merging 2 sorted lists requires $O(1)$ memory.

2-Way Sort

Pass 1
Create lots of (small) sorted lists.
Pass 2+
Merge sorted lists of size $N$ into sorted lists of size $2N$

Pass 1: Create Sorted Runs

Pass 2: Merge Sorted Runs

Repeat Pass 2 As Needed.

What's the bottleneck?

IO Cost: $O(N \cdot \lceil\log_2(N)\rceil)$
(with $N$ blocks)

Using More Memory

Pass 1
Sort Bigger Buffers
Re-Use Memory When Done
Pass 2
Merge $K$ Runs Simultaneously: $O(N \cdot \lceil\log_K(N)\rceil)$ IO

Replacement Sort

Replacement Sort

Replacement Sort

Replacement Sort

Replacement Sort

Replacement Sort

Replacement Sort

Replacement Sort

On average, we'll get runs of size $2 \cdot |WS|$