CSE-250 Fall 2022 - Section B - Induction (contd), Average Runtime

### Induction (contd), Average Runtime

Sept 28, 2022

#### Merge Sort

Divide: Split the sequence in half
$D(n) = \Theta(n)$ (can do in $\Theta(1)$)
Conquer: Sort left and right halves
$a = 2$, $b = 2$, $c = 1$
Combine: Merge halves together
$C(n) = \Theta(n)$

#### Merge Sort: Proof By Induction

Base Case: $T(1) \leq c \cdot 1$

$$c_0 \leq c$$

True for any $c > c_0$

#### Merge Sort: Proof By Induction

Assume: $T(\frac{n}{2}) \leq c \frac{n}{2} \log\left(\frac{n}{2}\right)$

Show: $T(n) \leq c n \log\left(n\right)$

$$2\cdot T(\frac{n}{2}) + c_1 + c_2 n \leq c n \log(n)$$

By the assumption and transitivity, showing the following inequality suffices:
$$2 c \frac{n}{2} \log\left(\frac{n}{2}\right) + c_1 + c_2 n \leq c n \log(n)$$

$$c n \log(n) - c n \log(2) + c_1 + c_2 n \leq c n \log(n)$$

$$c_1 + c_2 n \leq c n \log(2)$$

$$\frac{c_1}{n \log(2)} + \frac{c_2}{\log(2)} \leq c$$

True for any $n_0 \geq \frac{c_1}{\log(2)}$ and $c > \frac{c_2}{\log(2)}+1$

All of the "work" is in the combine step.

Can we put the work in the divide step?

Idea 1: Partition the data on the median value.

Idea 2: Partition the data in-place.

#### QuickSort (Idealized) (Wrong)

To sort an array of size $n$:
1. Pick a $pivot$ value.
2. Swap values until...
• array elements at $[1, \frac{n}{2})$ are $\leq pivot$
• array elements at $[\frac{n}{2}, n)$ are $> pivot$
3. Recursively sort $low$
4. Recursively sort $high$

#### QuickSort (Wrong)

def idealizedQuickSort(arr: Array[Int], from: Int, until: Int): Unit =
{
if(until - from < 1){ return }
val pivot = ???
var low = from, high = until -1

while(low < high){
while(arr(low) <= pivot && low < high){ low ++ }
if(low < high){
while(arr(high) > pivot && low < high){ high ++ }
swap(arr, low, high)
}
}
idealizedQuickSort(arr, from = 0,   until = low)
idealizedQuickSort(arr, from = low, until = until)
}

#### QuickSort (Wrong)

If we can obtain a pivot in $O(1)$, what's the complexity?

$$T_{quicksort}(n) = \begin{cases} \Theta(1) & \textbf{if } n = 1\\ 2 \cdot T(\frac{n}{2}) + \Theta(n) + 0 & \textbf{otherwise} \end{cases}$$

Contrast with MergeSort: $$T_{mergesort}(n) = \begin{cases} \Theta(1) & \textbf{if } n = 1\\ 2 \cdot T(\frac{n}{2}) + \Theta(1) + \Theta(n) & \textbf{otherwise} \end{cases}$$

#### QuickSort

Problem: Finding the median value of an unsorted collection is $O(n\log(n))$

(We'll talk about heaps later)

#### QuickSort

Idea: If we pick a value at random,
on average half the values will be lower.

#### QuickSort

1. Pick a value at random as a $pivot$.
2. Swap values until the array is subdivided into...
• $low$: array elements that are $\leq pivot$
• $pivot$
• $high$: array elements that are $> pivot$
3. Recursively sort $low$
4. Recursively sort $high$

What's the worst-case runtime?

#### QuickSort

What if we always pick the worst pivot?

[8, 7, 6, 5, 4, 3, 2, 1]

[7, 6, 5, 4, 3, 2, 1], 8, []

[6, 5, 4, 3, 2, 1], 7, [], 8

[5, 4, 3, 2, 1], 6, [], 7, 8

...

$$T_{quicksort}(n) \in O(n^2)$$

Is the worst case runtime representative?

No! (it'll almost always be faster)

Is there something we can say about the runtime?

#### QuickSort

Let's say we pick the $X$th largest element as pivot,
What's the recursive runtime for $T(n)$?

$$\begin{cases} T(0) + T(n-1) + \Theta(n) & \textbf{if } X = 1\\ T(1) + T(n-2) + \Theta(n) & \textbf{if } X = 2\\ T(2) + T(n-3) + \Theta(n) & \textbf{if } X = 3\\ ..\\ T(n-2) + T(1) + \Theta(n) & \textbf{if } X = n-1\\ T(n-1) + T(0) + \Theta(n) & \textbf{if } X = n\\ \end{cases}$$

How likely are we to pick $X = k$ for any specific $k$?

$P[X = k] = \frac{1}{n}$

... a brief aside...

#### Probabilities and Expectations

If I roll d6 (a 6-sided die 🎲) $k$ times,
what is the average over all possible outcomes?

#### k = 1

If I roll d6 (a 6-sided die 🎲) $1$ time...

Roll Probability Contribution
$\frac{1}{6}$ 1
$\frac{1}{6}$ 2
$\frac{1}{6}$ 3
$\frac{1}{6}$ 4
$\frac{1}{6}$ 5
$\frac{1}{6}$ 6

#### k = 1

$$\frac{1 + 2 + 3 + 4 + 5 + 6}{6}= 3.5$$
$$= \frac{1}{6}\cdot 1 + \frac{1}{6}\cdot 2 + \frac{1}{6}\cdot 3 + \frac{1}{6}\cdot 4 + \frac{1}{6}\cdot 5 + \frac{1}{6}\cdot 6$$
$$= \sum_{i} \texttt{Probability}_i \cdot \texttt{Contribution}_i$$

If $X$ is a random variable representing the outcome of the roll, we call this the expectation of $X$, or $E[X]$

$$E[X] = \sum_{i} P_i \cdot X_i$$

#### k = 2

If I roll d6 (a 6-sided die 🎲) $2$ times...

Does the outcome of one roll affect the other?

No: Each roll is an independent event.

If $X$ and $Y$ are random variables representing the outcome of each roll (i.e., independent random variables), $E[X + Y] = E[X] + E[Y]$

$= 3.5 + 3.5 = 7$