- Algorithms for Queries

Algorithms for Queries

February 18, 2021

Garcia-Molina/Ullman/Widom: Ch. 15.1-15.5, 16.7

News

  • Homework 1 assigned last night, due Weds night.
  • Checkpoint 1 posted Sunday. Submissions open tonight.

Query Evaluation Styles

All-At-Once (Collections)
Bottom-up, one operator at a time.
Volcano-Style (Iterators)
Operators "request" one tuple at a time from children.
Push-Style (Buffers)
Operators continuously produce/consume tuples.

Analyzing Volcano Operators

  • CPU Used
  • Memory Bounds
  • Disk IO Used

Databases are usually IO- or Memory-bound

Memory Bounds

  • Constant
  • Scales with output
  • Scales with part of the input
  • Worse

Core Question: Do we have enough memory to use this operator?

Disk IO

IO measured in:

  • Number of Tuples
  • Number of Data Pages (absolute size)

Accounting

Figure out the cost of each individual operator.

Only count the number of IOs added by each operator.

Note

So far, we've been pretending that each operator has one algorithm.

Often, there are many algorithms, some of which cover multiple operators.

This is why Spark has a PhysicalPlan

In the suggested Iterator-based approach the Iterators are your PhysicalPlan

Table Scan ($R$)

Memory Required?
Constant!
IOs added?
$|R|$ tuples read

Select ($\sigma(R)$)

Select ($\sigma(R)$)

Memory Required?
Constant!
IOs added?
None! (Can "inline" into cost of $R$)

Example

Example, assume $R$ is 100 tuples.

How many IOs do we need to compute $Q := R$

How many IOs do we need to compute $Q := \sigma(R)$

Project ($\pi(R)$)

Project ($\pi(R)$)

Memory Required?
Constant!
IOs added?
None!

Example

Example, assume $R$ is 100 tuples.

How many IOs do we need to compute $Q := \pi(R)$

How many IOs do we need to compute $Q := \pi(\sigma(R))$

Projection and Selection do not add IO.

Union ($R \cup S$)

Union ($R \cup S$)

Memory Required?
Constant!
IOs added?
None!

Cross ($R \times S$)

Cross ($R \times S$)

Memory Required?
It depends
IOs added?
It depends

Cross ($R \times S$)

How do you "reset" $S$?

"Materialize" S into memory
No extra IOs (but $O(|S|)$ memory)
Rerun the entire iterator
$(|R|-1) \cdot \texttt{cost}(S)$ extra tuples read
"Materialize" S onto disk
$|S|$ tuples written
$(|R|-1) \cdot |S|$ extra tuples read

This can get very expensive

Example

Example, assume $R$ and $S$ are both 100 tuples.

How many IOs do we need to compute $Q := R \cup S$?

  1. Getting an Iterator on $R$: 100 tuples
  2. Getting an Iterator on $S$: 100 tuples
  3. Getting an Iterator on $R \cup S$ using the above iterators: 0 extra tuples

Example

Example, assume $R$ is 20 tuples and $S$ is 100 tuples.

How many IOs do we need to compute $Q := R \times S$?

  1. Getting an Iterator on $R$: 20 tuples
  2. Getting an Iterator on $S$: 100 tuples
  3. Getting an Iterator on $R \times S$ using the above iterators:
  • Memory: 0 extra tuples
  • Replay: $(|R|-1) \times \texttt{cost}(S) = 19 \times 100 = 1900$ extra tuples
  • Cache: $|R| \times |S| = 20 \times 100 = 2000$ extra tuples

Best Total Cost $100 + 20 + 1900 = 2010$

Example

Example, assume $R$ is 20 tuples and $S$ is 100 tuples,
and $c$ filters out 90% of tuples.

How many IOs do we need to compute $Q := R \times \sigma_c(R \times S)$

  1. Getting an Iterator on $\sigma_c(R \times S)$: 2010 tuples
  2. Getting an Iterator on $R$: 20 tuples
  3. Getting an Iterator on $R \times \sigma_c(R \times S)$ using the above iterators:
  • Memory: 0 extra tuples
  • Replay: $(|R|-1) \times \texttt{cost}(\sigma_c(R \times S)) = 19 \times 2010 = 38190$ extra tuples
  • Cache: $|R| \times (0.1 \times (|R| \times |S|)) = 20 \times 200 = 4000$ extra tuples

Best Total Cost $2010 + 20 + 4000 = 6030$

Can we do better with cartesian product
(and joins)?

Nested-Loop Join

Problem: We need to evaluate rhs iterator
once per record in lhs

Preloading Data

Better Solution: Load both lhs and rhs records in blocks.


                    def apply_cross(lhs, rhs):
                      result = []

                      while r_block = lhs.take(100):
                        while s_block = rhs.take(100):
                          for r in r_block:
                            for s in s_block: 
                              result += [r + s]
                        rhs.reset()

                      return result
    

Block-Nested Loop Join

Block-Nested Loop ($R \times S$)

(with $\mathcal B$ as the block size for $R$)

(and with caching $S$ to disk)

Memory Required?
$O(\mathcal B)$
IOs added?
$|S|$ tuples written.
$(\frac{|R|}{\mathcal B} - 1) \cdot |S|$ tuples read.

In-memory caching is a special case of block-nested loop with $\mathcal B = |R|$

Does the block size for $S$ matter?

How big should the blocks be?

As big as possible!

... but more on that later.

Cross product is expensive!
Can we do better?

$\sigma_c(R\times S) \equiv R\bowtie_c S$

Cross Product

Problem: Naively, any tuple matches any other

Join Conditions

Solution: First organize the data

Strategies for Implementing $R \bowtie_{R.A = S.A} S$

In-Memory Index Join (1-pass Hash; Hash Join)
Build an in-memory index on one table, scan the other.
Partition Join (2-pass Hash; External Hash Join)
Partition both sides so that tuples don't join across partitions.
Sort/Merge Join
Sort all of the data upfront, then scan over both sides.

Hash Functions

  • A hash function is a function that maps a large data value to a small fixed-size value
    • Typically is deterministic & pseudorandom
  • Used in Checksums, Hash Tables, Partitioning, Bloom Filters, Caching, Cryptography, Password Storage, …
  • Examples: MD5, SHA1, SHA2
    • MD5() part of OpenSSL (on most OSX / Linux / Unix)
  • Can map h(k) to range [0,N) with h(k) % N (modulus)

Hash Functions

$$h(X) \mod N$$

  • Pseudorandom output between $[0, N)$
  • Always the same output for a given $X$

1-Pass Hash Join

1-Pass Hash Join

Limited Queries
Only supports join conditions of the form $R.A = S.B$
Moderate-High Memory
Keeps 1 full relation in memory
Low Added IO Cost
Only requires 1 scan over each input.

Alternative: Build an in-memory tree (e.g., B+Tree) instead of a hash table!

Limited Queries
Also supports $R.A \geq S.B$, $R.A > S.B$
Moderate-High Memory
Keeps 1 full relation in memory
Low Added IO Cost
Only requires 1 scan over each input.

2-Pass Hash Join

2-Pass Hash Join

Limited Queries
Only supports join conditions of the form $R.A = S.B$
Low Memory
Never need more than 1 pair of partitions in memory
High IO Cost
$|R| + |S|$ tuples written out
$|R| + |S|$ tuples read in

Why is it important that the hash function is pseudorandom?

What if the data is already organized (e.g., sorted) in a useful way?

Sort/Merge Join

Sort/Merge Join

Limited Queries
Only supports join conditions of the form $R.A = S.B$
Low Memory
Only needs to keep ~2 rows in memory at a time (not counting sort).
Low Added IO Cost
No added IO! (not counting sort).

Recap: Joins

Block-Nested Join
Moderate Memory, Moderate IO, High CPU
In-Memory Index Join (e.g., 1-Pass Hash)
High Memory, Low IO
Partition Join (e.g., 2-Pass Hash)
High IO, Low Memory
Sort/Merge Join
Low IO, Low Memory (But need sorted data)

Next time...

Extended Relational Algebra