February 23, 2021
Like normal projection, but can create new columns
$\pi_{M \leftarrow A+B*C,\; N \leftarrow 2}(R)$produces 1 row for every row of R, with 2 columns: M and N
... but first
NULL ValuesNULL value for these casesNULL makes things more complicated.
What happens if Trees.BORO is NULL?
| Unknown | AND | Unknown | $\equiv$ | Unknown |
| Unknown | AND | True | $\equiv$ | Unknown |
| Unknown | AND | False | $\equiv$ | False |
| Unknown | OR | Unknown | $\equiv$ | Unknown |
| Unknown | OR | True | $\equiv$ | True |
| Unknown | OR | False | $\equiv$ | Unknown |
| NOT | Unknown | $\equiv$ | Unknown | |
WHERE clauses eliminate all non-True rows
SELECT * FROM R
WHERE A = 2 AND NOT (A = 2)
$$UNKNOWN \wedge \neg UNKNOWN$$
$$UNKNOWN \wedge UNKNOWN$$
$$UNKNOWN$$
What happens if some streets have no trees?
Streets
| District | StreetName |
|---|---|
| A | Apple St. |
| B | Border St. |
| C | Coventry St. |
| D | Durham St. |
Trees
| TreeId | StreetName |
|---|---|
| 1 | Apple St. |
| 2 | Border St. |
| 3 | Durham St. |
| 4 | Elicott St. |
Streets ⟗ Trees
| District | StreetName | TreeId | StreetName |
|---|---|---|---|
| A | Apple St. | 1 | Apple St. |
| B | Border St. | 2 | Border St. |
| C | Coventry St. | NULL | NULL |
| D | Durham St. | 3 | Durham St. |
| NULL | NULL | 4 | Elicott St. |
Only LEFT outer join
Only RIGHT outer join
$$\tau_{A}(R)$$ The tuples of $R$ in ascending order according to 'A'
$$\textbf{L}_{n}(R)$$ The first $n$ tuples of R
Pick your favorite sort algorithm.
What happens if you don't have enough memory?
Key Idea: Merging 2 sorted lists requires $O(1)$ memory.
Repeat Pass 2 As Needed.
What's the bottleneck?
IO Cost: $O(N \cdot \lceil\log_2(N)\rceil)$
(with $N$ blocks)
On average, we'll get runs of size $2 \cdot |WS|$
SELECT COUNT(*) FROM RSELECT SUM(A) FROM RSELECT A, SUM(B) FROM R GROUP BY ASELECT DISTINCT A FROM RSELECT A FROM R GROUP BY A| TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
|---|---|---|---|
| 180683 | 'red maple' | 'Queens' | 3 |
| 315986 | 'pin oak' | 'Queens' | 21 |
| 204026 | 'honeylocust' | 'Brooklyn' | 3 |
| 204337 | 'honeylocust' | 'Brooklyn' | 10 |
| 189565 | 'American linden' | 'Brooklyn' | 21 |
| ... and 683783 more | |||
SELECT COUNT(*) FROM TREES
| TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
|---|---|---|---|
| COUNT = 0 | |||
| 180683 | 'red maple' | 'Queens' | 3 |
| COUNT = 1 | |||
| 315986 | 'pin oak' | 'Queens' | 21 |
| COUNT = 2 | |||
| 204026 | 'honeylocust' | 'Brooklyn' | 3 |
| COUNT = 3 | |||
| 204337 | 'honeylocust' | 'Brooklyn' | 10 |
| COUNT = 4 | |||
| 189565 | 'American linden' | 'Brooklyn' | 21 |
| COUNT = 5 | |||
| ... and 683783 more | |||
| COUNT = 683788 | |||
SELECT SUM(TREE_DBH) FROM TREES
| TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
|---|---|---|---|
| SUM = 0 | |||
| 180683 | 'red maple' | 'Queens' | 3 |
| SUM = 3 | |||
| 315986 | 'pin oak' | 'Queens' | 21 |
| SUM = 24 | |||
| 204026 | 'honeylocust' | 'Brooklyn' | 3 |
| SUM = 27 | |||
| 204337 | 'honeylocust' | 'Brooklyn' | 10 |
| SUM = 37 | |||
| 189565 | 'American linden' | 'Brooklyn' | 21 |
| SUM = 58 | |||
| ... and 683783 more | |||
This is also sometimes called a "fold"
Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
SELECT SPC_COMMON, COUNT(*) FROM TREES GROUP BY SPC_COMMON
Naive Idea: Keep a separate accumulator for each group
| TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
|---|---|---|---|
| {} | |||
| 180683 | 'red maple' | 'Queens' | 3 |
| { 'red maple' = 1 } | |||
| 204337 | 'honeylocust' | 'Brooklyn' | 10 |
| { 'red maple' = 1, 'honeylocust' = 1 } | |||
| 315986 | 'pin oak' | 'Queens' | 21 |
| { 'red maple' = 1, 'honeylocust' = 1, 'pin oak' = 1 } | |||
| 204026 | 'honeylocust' | 'Brooklyn' | 3 |
| { 'red maple' = 1, 'honeylocust' = 2, 'pin oak' = 1 } | |||
What could go wrong?
| TREE_ID | SPC_COMMON | BORONAME | TREE_DBH |
|---|---|---|---|
| {} | |||
| 204337 | 'honeylocust' | 'Brooklyn' | 10 |
| { 'honeylocust' = 1 } | |||
| 204026 | 'honeylocust' | 'Brooklyn' | 3 |
| { 'honeylocust' = 2 } | |||
| ... and more | |||
| 315986 | 'pin oak' | 'Queens' | 21 |
| { 'honeylocust' = 3206, 'pin oak' = 1 } | |||
| ... and more | |||
| 180683 | 'red maple' | 'Queens' | 3 |
| { 'pin oak' = 53814, 'red maple' = 1 } | |||
Physical Layouts