CSE 4/562 - Database Systems

- Set/Bag Operations
- Select ($\sigma$), Project ($\pi$), Join ($\bowtie$), Union ($\cup$)
- Bag Operations
- Distinct ($\delta$), Outer Joins (⟗)
- List Operations
- Sort ($\tau$), Limit
- Arithmetic Operations
- Extended Projection ($\pi$), Aggregation ($\sigma$), Grouping ($\gamma$)

Like normal projection, but can create new columns

$\pi_{M \leftarrow A+B*C,\; N \leftarrow 2}(R)$produces 1 row for every row of R, with 2 columns: M and N

... but first

`NULL`

Values- Field values can be unknown or inapplicable.
- A tree with an unknown species.
- The street of a tree in a park.
- The '.' on many of your ID cards

- SQL provides a special
`NULL`

value for these cases

NULL makes things more complicated.

What happens if **Trees.SPC_COMMON** is NULL?

$$\texttt{NULL} = \textbf{'Brooklyn'} \equiv \textbf{Unknown}$$

Unknown | AND | Unknown | $\equiv$ | Unknown |

Unknown | AND | True | $\equiv$ | Unknown |

Unknown | AND | False | $\equiv$ | False |

Unknown | OR | Unknown | $\equiv$ | Unknown |

Unknown | OR | True | $\equiv$ | True |

Unknown | OR | False | $\equiv$ | Unknown |

NOT | Unknown | $\equiv$ | Unknown |

`WHERE`

clauses eliminate all non-True rows

What happens if some streets have no trees?

- Include all results from the normal (inner) join.
- Also include rows that don't get joined.

- Inner Join
- Normal, plain, simple join
- Left Outer Join (⟕)
- Include un-joined rows from the left hand side
- Right Outer Join (⟖)
- Include un-joined rows from the right hand side
- [Full] Outer Join (⟗)
- Include un-joined rows from either side

$$\tau_{A}(R)$$ The tuples of $R$ in ascending order according to 'A'

$$\textbf{L}_{n}(R)$$ The first $n$ tuples of R

(Typically combined with sort. If not, pick arbitrarily.)

Pick your favorite sort algorithm.

What happens if you don't have enough memory?

**Key Idea:** Merging 2 sorted lists requires $O(1)$ memory.

- Pass 1
- Create lots of (small) sorted lists.
- Pass 2+
- Merge sorted lists of size $N$ into sorted lists of size $2N$

Repeat Pass 2 As Needed.

What's the bottleneck?

IO Cost: $O(N \cdot \lceil\log_2(N)\rceil)$

(with $N$ blocks)

- Pass 1
- Sort Bigger Buffers
- Re-Use Memory When Done
- Pass 2
- Merge $K$ Runs Simultaneously: $O(N \cdot \lceil\log_K(N)\rceil)$ IO

On average, we'll get runs of size $2 \cdot |WS|$