Cost Based Optimization

Apply blind heuristics (e.g., push down selections)
Enumerate all possible execution plans by varying (or for a reasonable subset)
- Join/Union Evaluation Order (commutativity, associativity, distributivity)
- Algorithms for Joins, Aggregates, Sort, Distinct, and others
- Data Access Paths
Estimate the cost of each execution plan
Pick the execution plan with the lowest cost

Idea 1: Run each plan

If we can't get the exact cost of a plan, what can we do?

Idea 2: Run each plan on a small sample of the data.

Idea 3: Analytically estimate the cost of a plan.

Randal Munroe (cc-by-nc)

Figure out the cost of each individual operator.

Only count the number of IOs added by each operator.

Operation	RA	IOs Added (#pages)	Memory (#tuples)
Table Scan	$R$	$\frac{\|R\|}{\mathcal P}$	$O(1)$
Projection	$\pi(R)$	$0$	$O(1)$
Selection	$\sigma(R)$	$0$	$O(1)$
Union	$R \uplus S$	$0$	$O(1)$
Sort (In-Mem)	$\tau(R)$	$0$	$O(\|R\|)$
Sort (On-Disk)	$\tau(R)$	$\frac{2 \cdot \lfloor log_{\mathcal B}(\|R\|) \rfloor}{\mathcal P}$	$O(\mathcal B)$
(B+Tree) Index Scan	$Index(R, c)$	$\log_{\mathcal I}(\|R\|) + \frac{\|\sigma_c(R)\|}{\mathcal P}$	$O(1)$
(Hash) Index Scan	$Index(R, c)$	$1$	$O(1)$

Operation	RA	IOs Added (#pages)	Memory (#tuples)
Nested Loop Join (Buffer $S$ in mem)	$R \times S$	$0$	$O(\|S\|)$
Nested Loop Join (Buffer $S$ on disk)	$R \times_{disk} S$	$(1+ \|R\|) \cdot \frac{\|S\|}{\mathcal P}$	$O(1)$
1-Pass Hash Join	$R \bowtie_{1PH, c} S$	$0$	$O(\|S\|)$
2-Pass Hash Join	$R \bowtie_{2PH, c} S$	$\frac{2\|R\| + 2\|S\|}{\mathcal P}$	$O(1)$
Sort-Merge Join	$R \bowtie_{SM, c} S$	[Sort]	[Sort]
(Tree) Index NLJ	$R \bowtie_{INL, c}$	$\|R\| \cdot (\log_{\mathcal I}(\|S\|) + \frac{\|\sigma_c(S)\|}{\mathcal P})$	$O(1)$
(Hash) Index NLJ	$R \bowtie_{INL, c}$	$\|R\| \cdot 1$	$O(1)$
(In-Mem) Aggregate	$\gamma_A(R)$	$0$	$adom(A)$
(Sort/Merge) Aggregate	$\gamma_A(R)$	[Sort]	[Sort]

Next Class: How to estimate $|R|$