CSE 4/562 - Database Systems

# Extended RA

### Extended Relational Algebra

Set/Bag Operations
Select ($\sigma$), Project ($\pi$), Join ($\bowtie$), Union ($\cup$)
Bag Operations
Distinct ($\delta$), Outer Joins (⟗)
List Operations
Sort ($\tau$), Limit
Arithmetic Operations
Extended Projection ($\pi$), Aggregation ($\sigma$), Grouping ($\gamma$)

### Extended Projection

Like normal projection, but can create new columns

$\pi_{M \leftarrow A+B*C,\; N \leftarrow 2}(R)$

produces 1 row for every row of R, with 2 columns: M and N

... but first

### NULL Values

• Field values can be unknown or inapplicable.
• A tree with an unknown species.
• The street of a tree in a park.
• The '.' on many of your ID cards
• SQL provides a special NULL value for these cases

NULL makes things more complicated.

$$\textbf{Trees.SPC_COMMON} = \texttt{'Brooklyn'}$$

What happens if Trees.SPC_COMMON is NULL?

$$\texttt{NULL} = \textbf{'Brooklyn'} \equiv \textbf{Unknown}$$
 Unknown AND Unknown $\equiv$ Unknown Unknown AND True $\equiv$ Unknown Unknown AND False $\equiv$ False Unknown OR Unknown $\equiv$ Unknown Unknown OR True $\equiv$ True Unknown OR False $\equiv$ Unknown NOT Unknown $\equiv$ Unknown

WHERE clauses eliminate all non-True rows

$$Streets \bowtie_{StreetName} Trees$$

What happens if some streets have no trees?

### Outer Join (⟗, ⟕, ⟖)

1. Include all results from the normal (inner) join.
2. Also include rows that don't get joined.

### Outer Join

Inner Join
Normal, plain, simple join
Left Outer Join (⟕)
Include un-joined rows from the left hand side
Right Outer Join (⟖)
Include un-joined rows from the right hand side
[Full] Outer Join (⟗)
Include un-joined rows from either side

### Sort / Limit

$$\tau_{A}(R)$$ The tuples of $R$ in ascending order according to 'A'

$$\textbf{L}_{n}(R)$$ The first $n$ tuples of R

(Typically combined with sort. If not, pick arbitrarily.)

### Sort

What happens if you don't have enough memory?

Key Idea: Merging 2 sorted lists requires $O(1)$ memory.

### 2-Way Sort

Pass 1
Create lots of (small) sorted lists.
Pass 2+
Merge sorted lists of size $N$ into sorted lists of size $2N$

### Pass 2: Merge Sorted Runs

Repeat Pass 2 As Needed.

What's the bottleneck?

IO Cost: $O(N \cdot \lceil\log_2(N)\rceil)$
(with $N$ blocks)

### Using More Memory

Pass 1
Sort Bigger Buffers
Re-Use Memory When Done
Pass 2
Merge $K$ Runs Simultaneously: $O(N \cdot \lceil\log_K(N)\rceil)$ IO

### Replacement Sort

On average, we'll get runs of size $2 \cdot |WS|$