Midterm Review
CSE-250 Fall 2022 - Section B
Oct 17, 2022
Scala Types
Type
Description
Examples
Boolean Binary value true, false
Char 16-bit unsigned integer ‘x’, ‘y’
Byte 8-bit signed integer 42.toByte
Short 16-bit signed integer 42.toShort
Int 32-bit signed integer 42
Long 64-bit signed integer 42l
Float Single-precision floating-point number 42.0f
Double Double-precision floating-point number 42.0
Unit No value ()
Mutable vs Immutable
Mutable
Something that can be changed
Immutable
Something that can not be changed
val : A val ue that can not be reassigned (immutable)
var : A var iable that can be reassigned (mutable)
Mutable vs Immutable
scala> val s = mutable.Set(1, 2, 3)
scala> s += 4
scala> println(s.mkString(", ")
1, 2, 3, 4
If a val points to a mutable object, the mutable object can still be changed.
Logarithms
Let $a, b, c, n > 0$
Exponent Rule : $\log(n^a) = a \log(n)$
Product Rule : $\log(an) = \log(a) + \log(n)$
Division Rule : $\log\left(\frac{n}{a}\right) = \log(n) - \log(a)$
Change of Base from $b$ to $c$ : $\log_b(n) = \frac{\log_c(n)}{\log_c(b)}$
Log/Exponent are Inverses : $b^{\log_b(n)} = \log_b(b^n) = n$
Growth Functions
Assumptions about
$f(n)$
Problem sizes are non-negative integers
$n \in \mathbb Z^+ \cup \{0\}$
We can't reverse time
$f(n) \geq 0$
Smaller problems aren't harder than bigger problems
For any $n_1 < n_2$, $f(n_1) \leq f(n_2)$
To make the math simpler, we'll allow fractional steps.
Asymptotic Analysis @ 5000 feet
Goal : Organize runtimes (growth functions) into different Complexity Classes .
Within a complexity class, runtimes "behave the same"
Big-Theta
The following are all saying the same thing
$\lim_{n\rightarrow \infty}\frac{f(n)}{g(n)} = $ some non-zero constant.
$f(n)$ and $g(n)$ have the same complexity.
$f(n)$ and $g(n)$ are in the same complexity class.
$f(n) \in \Theta(g(n))$
$f(n)$ is bounded from above and below by $g(n)$
Big-Theta (As a Bound)
$f(n) \in \Theta(g(n))$ iff...
$\exists c_{low}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \geq c_{low}\cdot g(n)$
There is some $c_{low}$ that we can multiply $g(n)$ by so that $f(n)$ is always bigger than $c_{low}g(n)$ for values of $n$ above some $n_0$
$\exists c_{high}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \leq c_{high}\cdot g(n)$
There is some $c_{high}$ that we can multiply $g(n)$ by so that $f(n)$ is always smaller than $c_{high}g(n)$ for values of $n$ above some $n_0$
Proving Big-Theta
Assume $f(n) \geq c_{low}g(n)$.
Rewrite the above formula to find a $c_{low}$ for which it holds (for big enough n).
Assume $f(n) \leq c_{high}g(n)$.
Rewrite the above formula to find a $c_{high}$ for which it holds (for big enough n).
Shortcut: Find the dominant term being summed, and remove constants.
Common Runtimes
Constant Time: $\Theta(1)$
e.g., $T(n) = c$ (runtime is independent of $n$)
Logarithmic Time: $\Theta(\log(n))$
e.g., $T(n) = c\log(n)$ (for some constant $c$)
Linear Time: $\Theta(n)$
e.g., $T(n) = c_1n + c_0$ (for some constants $c_0, c_1$)
Quadratic Time: $\Theta(n^2)$
e.g., $T(n) = c_2n^2 + c_1n + c_0$
Polynomial Time: $\Theta(n^k)$ (for some $k \in \mathbb Z^+$)
e.g., $T(n) = c_kn^k + \ldots + c_1n + c_0$
Exponential Time: $\Theta(c^n)$ (for some $c \geq 1$)
Other Bounds
$f(n) \in \Theta(g(n))$ iff...
$\exists c_{low}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \geq c_{low}\cdot g(n)$
There is some $c_{low}$ that we can multiply $g(n)$ by so that $f(n)$ is always bigger than $c_{low}g(n)$ for values of $n$ above some $n_0$
$\exists c_{high}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \leq c_{high}\cdot g(n)$
There is some $c_{high}$ that we can multiply $g(n)$ by so that $f(n)$ is always smaller than $c_{high}g(n)$ for values of $n$ above some $n_0$
Other Bounds
$f(n) \in O(g(n))$ iff...
$\exists c_{low}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \geq c_{low}\cdot g(n)$
There is some $c_{low}$ that we can multiply $g(n)$ by so that $f(n)$ is always bigger than $c_{low}g(n)$ for values of $n$ above some $n_0$
$\exists c_{high}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \leq c_{high}\cdot g(n)$
There is some $c_{high}$ that we can multiply $g(n)$ by so that $f(n)$ is always smaller than $c_{high}g(n)$ for values of $n$ above some $n_0$
Other Bounds
$f(n) \in \Omega(g(n))$ iff...
$\exists c_{low}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \geq c_{low}\cdot g(n)$
There is some $c_{low}$ that we can multiply $g(n)$ by so that $f(n)$ is always bigger than $c_{low}g(n)$ for values of $n$ above some $n_0$
$\exists c_{high}, n_{0}$ s.t. $\forall n > n_{0}$, $f(n) \leq c_{high}\cdot g(n)$
There is some $c_{high}$ that we can multiply $g(n)$ by so that $f(n)$ is always smaller than $c_{high}g(n)$ for values of $n$ above some $n_0$
Other Bounds
Big-O: "Worst Case" bound
The set of all functions in the same or a smaller complexity class (same or faster runtime).
Big-Ω: "Best Case" bound
The set of all functions in the same or a bigger complexity class (same or slower runtime).
Big-Ï´: "Tight" bound
The set of all functions in the same complexity class (same runtime).
"Tight Worst Case"
$n^2 \in O(n^3)$ is true, but you can do better.
$\Theta(g(n)) = O(g(n)) \cap \Omega(g(n))$
$\Theta$ is the same as ($O$ and $\Omega$)
Analyzing Code
The growth function for a block of code...
One Line of Code
Sum the growth functions of every method the line invokes.
Lines of Sequential Code
Sum up the growth function of each line of code.
Loops
Use a summation ($\sum$) over the body of the loop.
(Shorthand: $k$ loops with a $O(f(n))$ body = $O(k\cdot f(n))$
The Seq ADT
apply(idx : Int): A
Get the element (of type A) at position idx .
iterator: Iterator[A]
Get access to view all elements in the seq, in order, once.
length: Int
Count the number of elements in the seq.
The mutable.Seq ADT
apply(idx : Int): A
Get the element (of type A) at position idx .
iterator: Iterator[A]
Get access to all elements in the seq, in order, once.
length: Int
Count the number of elements in the seq.
insert(idx : Int, elem : A): Unit
Insert an element at position idx with value elem .
remove(idx : Int): A
Remove the element at position idx and return the removed value.
Sequence Implementations
Linked List
$O(1)$ mutations by reference or head/tail, $O(n)$ otherwise
Array
$O(1)$ access/update, $O(n)$ insert/remove
ArrayBuffer
$O(1)$ access/update, $O(n)$ insert/remove
$O(1)$ amortized append
Amortized $O(1)$
Bounds for one call:
Amortized says nothing.
Bounds for $n$ calls:
Guaranteed always $< n\cdot O(1)$.
Expected $O(1)$
No guarantees at all!
Guarantees
Tight $\Theta(f(n))$
The cost of one call will always be a constant factor from $f(n)$.
Worst Case $O(f(n))$
The cost of one call will never be worse than a constant factor from $f(n)$.
Amortized Worst Case $O(f(n))$
The cost of n calls will never be worse than a constant factor from $n \cdot f(n)$.
Expected Worst Case $O(f(n))$
"Usually" not worse than a constant factor from $f(n)$, but no promises.
Sequences
Operation
Array
ArrayBuffer
LinkedList by Index
LinkedList by Ref
apply
$O(1)$
$O(1)$
$O(n)$ or $O(i)$
$O(1)$
update
$O(1)$
$O(1)$
$O(n)$ or $O(i)$
$O(1)$
insert
$O(n)$
$O(n)$ or Amortized $O(n-i)$
$O(n)$ or $O(i)$
$O(1)$
remove
$O(n)$
$O(n)$ or $O(n-i)$
$O(n)$ or $O(i)$
$O(1)$
append
$O(n)$
$O(n)$ or Amortized $O(1)$
$O(n)$ or $O(i)$
$O(1)$
Fibonacci
What's the complexity? (in terms of n )
def fibb(n: Int): Long =
if(n < 2){ 1 }
else { fibb(n-1) + fibb(n-2) }
Fibonacci
$$T(n) = \begin{cases}
\Theta(1) & \textbf{if } n < 2\\
T(n-1) + T(n-2) + \Theta(1) & \textbf{otherwise}
\end{cases}$$
Test Hypothesis: $T(n) \in O(2^n)$
Merge Sort
def merge[A: Ordering](left: Seq[A], right: Seq[A]): Seq[A] = {
val output = ArrayBuffer[A]()
val leftItems = left.iterator.buffered
val rightItems = right.iterator.buffered
while(leftItems.hasNext || rightItems.hasNext) {
if(!left.hasNext) { output.append(right.next) }
else if(!right.hasNext) { output.append(left.next) }
else if(Ordering[A].lt( left.head, right.head ))
{ output.append(left.next) }
else { output.append(right.next) }
}
output.toSeq
}
Merge Sort
Each time though loop advances either left or right.
Total Runtime: $\Theta(|\texttt{left}| + |\texttt{right}|)$
Merge Sort
Observation: Merging two sorted arrays can be done in $O(n)$.
Idea: Split the input in half, sort each half, and merge.
Merge Sort
def sort[A: Ordering](data: Seq[A]): Seq[A] =
{
if(data.length <= 1) { return data }
else {
val (left, right) = data.splitAt(data.length / 2)
return merge(
sort(left),
sort(right)
)
}
}
Merge Sort
Divide: Split the sequence in half
$D(n) = \Theta(n)$ (can do in $\Theta(1)$)
Conquer: Sort left and right halves
$a = 2$, $b = 2$, $c = 1$
Combine: Merge halves together
$C(n) = \Theta(n)$
Merge Sort
$$T(n) = \begin{cases}
\Theta(1) & \textbf{if } n \leq 1 \\
2\cdot T(\frac{n}{2}) + \Theta(1) + \Theta(n) & \textbf{otherwise}
\end{cases}$$
How can we find a closed-form hypothesis?
Idea: Draw out the cost of each level of recursion.
Merge Sort: Recursion Tree
$$T(n) = \begin{cases}
\Theta(1) & \textbf{if } n \leq 1 \\
2\cdot T(\frac{n}{2}) + \Theta(1) + \Theta(n) & \textbf{otherwise}
\end{cases}$$
Each node of the tree shows $D(n) + C(n)$
Hypothesis: $n \cdot \log(n)$
Merge Sort: Proof By Induction
Now use induction to prove that there is a $c, n_0$ such that $T(n) \leq c \cdot n\log(n)$ for any $n > n_0$
$$T(n) = \begin{cases}
c_0 & \textbf{if } n \leq 1 \\
2\cdot T(\frac{n}{2}) + c_1 + c_2\cdot n & \textbf{otherwise}
\end{cases}$$
Merge Sort: Proof By Induction
Base Case: $T(1) \leq c \cdot 1$
$$c_0 \leq c$$
True for any $c > c_0$
Merge Sort: Proof By Induction
Assume: $T(\frac{n}{2}) \leq c \frac{n}{2} \log\left(\frac{n}{2}\right)$
Show: $T(n) \leq c n \log\left(n\right)$
$$2\cdot T(\frac{n}{2}) + c_1 + c_2 n \leq c n \log(n)$$
By the assumption and transitivity, showing the following inequality suffices:
$$2 c \frac{n}{2} \log\left(\frac{n}{2}\right) + c_1 + c_2 n \leq c n \log(n)$$
$$c n \log(n) - c n \log(2) + c_1 + c_2 n \leq c n \log(n)$$
$$c_1 + c_2 n \leq c n \log(2)$$
$$\frac{c_1}{n \log(2)} + \frac{c_2}{\log(2)} \leq c$$
True for any $n_0 \geq \frac{c_1}{\log(2)}$ and $c > \frac{c_2}{\log(2)}+1$
Stacks vs Queues
Push
Put a new object on top of the stack
Pop
Remove the object on top of the stack
Top
Peek at what's on top of the stack
Enqueue
Put a new object at the end of the queue
Dequeue
Remove the next object in the queue
Head
Peek at the next object in the queue
Queues vs Stacks
Queue
First in, First out (FIFO)
Stack
Last in, First Out (LIFO / FILO)
Graphs
A graph is a pair $(V, E)$ where
$V$ is a set of vertices
$E$ is a set of vertex pairs called edges
edges and vertices may also store data (labels )
Edge Types
Directed Edge (e.g., transmit bandwidth)
Ordered pair of vertices $(u, v)$
origin ($u$) → destination ($v$)
Undirected edge (e.g., round-trip latency)
Unordered pair of vertices $(u, v)$
Directed Graph
All edges are directed
Undirected Graph
All edges are undirected
Terminology
Endpoints (end-vertices) of an edge
U, V are the endpoints of a
Edges incident on a vertex
a, b, d are incident on V
Adjacent Vertices
U, V are adjacent
Degree of a vertex (# of incident edges)
X has degree 5
Parallel Edges
h, i are parallel
Self-Loop
j is a self-loop
Simple Graph
A graph without parallel edges or self-loops
Notation
$n$
The number of vertices
$m$
The number of edges
$deg(v)$
The degree of vertex $v$
Graph Properties
$$\sum_{v} deg(v) = 2m$$
Proof: Each edge is counted twice
Graph Properties
In a directed graph with no self-loops and no parallel edges:
$$m \leq n(n-1)$$
No parallel edges: each pair connected at most once
No self loops: pick each vertex once
$n$ choices for the first vertex $(n-1)$ choices for the second vertex
$$m \leq n(n-1)$$
A (Directed) Graph ADT
Two type parameters (Graph[V, E] )
V : The vertex label type
E : The edge label type
Vertices
... are elements (like Linked List Nodes)
... store a value of type V
Edges
... are elements
... store a value of type E
A (Directed) Graph ADT
trait Graph[V, E] {
def vertices: Iterator[Vertex]
def edges: Iterator[Edge]
def addVertex(label: V): Vertex
def addEdge(orig: Vertex, dest: Vertex, label: E): Edge
def removeVertex(vertex: Vertex): Unit
def removeEdge(edge: Edge): Unit
}
A (Directed) Graph ADT
trait Vertex[V, E] {
def outEdges: Seq[Edge]
def inEdges: Seq[Edge]
def incidentEdges: Iterator[Edge] = outEdges ++ inEdges
def edgeTo(v: Vertex): Boolean
def label: V
}
trait Edge[V, E] {
def origin: Vertex
def destination: Vertex
def label: E
}
Edge List Summary
addEdge, addVertex: $O(1)$
removeEdge: $O(1)$
removeVertex: $O(m)$
vertex.incidentEdges: $O(m)$
vertex.edgeTo: $O(m)$
Space Used : $O(n) + O(m)$
Idea: Store the in/out edges for each vertex.
Attempt 3: Adjacency List
class DirectedGraphV3[V, E] extends Graph[V, E] {
/* ... */
class Vertex(label: V) = {
var node: DoublyLinkedList[Vertex].Node = null
val inEdges = DoublyLinkedList[Edge]()
val outEdges = DoublyLinkedList[Edge]()
/* ... */
}
class Edge(orig: Vertex, dest: Vertex, label: E) = {
var node: DoublyLinkedList[Edge].Node = null
var origNode: DoublyLinkedList[Edge].Node = null
var destNode: DoublyLinkedList[Edge].Node = null
/* ... */
}
/* ... */
}
Adjacency List Summary
addEdge, addVertex: $O(1)$
removeEdge: $O(1)$
vertex.incidentEdges: $O(deg(vertex))$
removeVertex: $O(deg(vertex))$
vertex.edgeTo: $O(deg(vertex))$
Space Used : $O(n) + O(m)$
A few more definitions...
A subgraph $S$ of a graph $G$ is a graph where...
$S$'s vertices are a subset of $G$'s vertices
$S$'s edges are a subset of $G$'s edges
A spanning subgraph of $G$...
Is a subgraph of $G$
Contains all of $G$'s vertices.
A few more definitions...
A graph is connected ...
If there is a path between every pair of vertices
A connected component of $G$...
Is a maximal connected subgraph of $G$
"maximal" means you can't add any new vertex without breaking the property
Any subset of $G$'s edges that connects the subgraph is fine.
A few more definitions...
A free tree is an un directed graph $T$ such that:
There is exactly one simple path between any two nodes
T is connected
T has no cycles
A rooted tree is a directed graph $T$ such that:
One vertex of $T$ is the root
There is exactly one simple path from the root to every other vertex in the graph.
A (free/rooted) forest is a graph $F$ such that
Every connected component is a tree.
A few more definitions...
A spanning tree of a connected graph...
Is a spanning subgraph that is a tree.
Is not unique unless the graph is a tree.
Depth-First Search
Summing up...
Mark Vertices UNVISITED
$O(|V|)$
Mark Edges UNVISITED
$O(|E|)$
DFS Vertex Loop
$O(|V|)$
All Calls to DFSOne
$O(|E|)$
$O(|V| + |E|)$
Depth-First Search
Summing up...
Mark Vertices UNVISITED
$O(|V|)$
Mark Edges UNVISITED
$O(|E|)$
Add each vertex to the work queue
$O(|V|)$
Process each vertex
$O(|E|)$
$O(|V| + |E|)$
Application
DFS
BFS
Spanning Trees ✔ ✔
Connected Components ✔ ✔
Paths/Connectivity ✔ ✔
Cycles ✔ ✔
Shortest Paths ✔
Articulation Points ✔
DFS vs BFS