CSE 4/562 - Database Systems

Cost Based Optimization

General Query Optimizers

1. Apply blind heuristics (e.g., push down selections)
2. Enumerate all possible execution plans by varying (or for a reasonable subset)
• Join/Union Evaluation Order (commutativity, associativity, distributivity)
• Algorithms for Joins, Aggregates, Sort, Distinct, and others
• Data Access Paths
3. Estimate the cost of each execution plan
4. Pick the execution plan with the lowest cost

Idea 1: Run each plan

If we can't get the exact cost of a plan, what can we do?

Idea 2: Run each plan on a small sample of the data.

Idea 3: Analytically estimate the cost of a plan.

Plan Cost

CPU Time
How much time is spent processing.
# of IOs
How many random reads + writes go to disk.
Memory Required
How much memory do you need.
Randal Munroe (cc-by-nc)

Remember the Real Goals

1. Accurately rank the plans.
2. Don't spend more time optimizing than you get back.
3. Don't pick a plan that uses more memory than you have.

Accounting

Figure out the cost of each individual operator.

Only count the number of IOs added by each operator.

Table Scan $R$ $\frac{|R|}{\mathcal P}$ $O(1)$
Projection $\pi(R)$ $0$ $O(1)$
Selection $\sigma(R)$ $0$ $O(1)$
Union $R \uplus S$ $0$ $O(1)$
Sort (In-Mem) $\tau(R)$ $0$ $O(|R|)$
Sort (On-Disk) $\tau(R)$ $\frac{2 \cdot \lfloor log_{\mathcal B}(|R|) \rfloor}{\mathcal P}$ $O(\mathcal B)$
(B+Tree) Index Scan $Index(R, c)$ $\log_{\mathcal I}(|R|) + \frac{|\sigma_c(R)|}{\mathcal P}$ $O(1)$
(Hash) Index Scan $Index(R, c)$ $1$ $O(1)$
1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
2. Size of $R$ ($|R|$)
3. Pages of Buffer ($\mathcal B$)
4. Keys per Index Page ($\mathcal I$)
Nested Loop Join (Buffer $S$ in mem) $R \times S$ $0$ $O(|S|)$
Nested Loop Join (Buffer $S$ on disk) $R \times_{disk} S$ $(1+ |R|) \cdot \frac{|S|}{\mathcal P}$ $O(1)$
1-Pass Hash Join $R \bowtie_{1PH, c} S$ $0$ $O(|S|)$
2-Pass Hash Join $R \bowtie_{2PH, c} S$ $\frac{2|R| + 2|S|}{\mathcal P}$ $O(1)$
Sort-Merge Join $R \bowtie_{SM, c} S$ [Sort] [Sort]
(Tree) Index NLJ $R \bowtie_{INL, c}$ $|R| \cdot (\log_{\mathcal I}(|S|) + \frac{|\sigma_c(S)|}{\mathcal P})$ $O(1)$
(Hash) Index NLJ $R \bowtie_{INL, c}$ $|R| \cdot 1$ $O(1)$
(In-Mem) Aggregate $\gamma_A(R)$ $0$ $adom(A)$
(Sort/Merge) Aggregate $\gamma_A(R)$ [Sort] [Sort]
1. Tuples per Page ($\mathcal P$) – Normally defined per-schema
2. Size of $R$ ($|R|$)
3. Pages of Buffer ($\mathcal B$)
4. Keys per Index Page ($\mathcal I$)
5. Number of distinct values of $A$ ($adom(A)$)

Next Class: How to estimate $|R|$