r = get_table("R")
s = get_table("S")
temp1 = apply_join(r, s, "R.B = S.B")
temp2 = apply_select(temp1, "S.C = 10")
result = apply_projection(temp2, "R.A")
def build_tree(operator):
if """ operator is a base table """:
return get_table(...)
elif """ operator is a selection """:
return apply_select(operator.child, operator.condition)
elif """ handle remaining cases similarly """:
$$\sigma_{A \neq 3} R$$
A | B |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
def apply_select(input, condition)
result = []
for row in input:
if condition(row):
result += [row]
return result;
(All-At-Once)
$$\sigma_{A \neq 3} R$$
A | B | |
---|---|---|
getNext() | for row in input: | |
1 | 2 | return row; |
getNext() | for row in input: | |
3 | 4 | X |
5 | 6 | return row; |
getNext() | for row in input: | |
None | return None; |
def apply_cross(lhs, rhs):
result = []
for r in lhs:
for s in rhs:
result += [r + s]
return result
What's the complexity of this cross-product algorithm?
... in terms of compute
... in terms of IOs
Nested-Loop Join
def apply_cross(lhs, rhs):
result = []
while r = lhs.next():
while s = rhs.next():
result += [r + s]
rhs.reset()
return result
Problem: We need to evaluate rhs
iterator
once per record in lhs
Naive Solution: Preload records from lhs
def apply_cross(lhs, rhs):
result = []
rhs_preloaded = []
while s = rhs.next():
rhs_preloaded += [s]
while r = lhs.next():
for s in rhs_preloaded:
result += [r + s]
return result
Any problems with this?
Better Solution: Load both lhs
and rhs
records in blocks.
def apply_cross(lhs, rhs):
result = []
while r_block = lhs.take(100):
while s_block = rhs.take(100):
for r in r_block:
for s in s_block:
result += [r + s]
rhs.reset()
return result
How big should the blocks be?
What is the IO complexity of the algorithm?
Problem: Naively, any tuple matches any other
Solution: First organize the data
$$h(X) \mod N$$
Can use other in-memory indexes to support other join conditions.
Can partition on data-values to support other types of queries.
Why is it important that the hash function is pseudorandom?
More operators, More algorithms