Those Marvelous Lifts and Exists (Part 1)

We've been talking for the past few weeks about optimization of AGCA expressions.  So far, most of our optimizations have made one extremely significant simplifying assumption: They ignore nested expressions.  I was going to talk this week about techniques for un-nesting expressions, but before I get to that, I'm going to cover the two sources of nesting in AGCA expressions that I haven't covered yet: Lift in its full glory, and Exists.

So far I've used Lift as a simple form of assignment.  

X ^= {Y}

Computes the value of Y and assigns it to X.  I've used it for more complex expressions too:

X ^= {2*Y + Z}

But again, it's a simple arithmetic expression being used in the assignment.  What if we want to do something more complex?  What if we want to express something like

SELECT SUM(A)
FROM R
WHERE R.B = (SELECT COUNT(*) FROM S)

This is an example of a nested aggregate query, and it poses a bit of a problem, both in terms of AGCA, and more generally for incremental computation in the sense of delta operations.  Up to this point, whenever we added (or deleted) a value from one relation, we'd need to add (or subtract) something to (from) the result we were trying to compute.  Nested subqueries are different.  Let's have a look with the following example database

_R_(_A__B_)____#_
   < 1, 1 > -> 1
   < 1, 2 > -> 1
   < 2, 2 > -> 1
_S_(_C_)____#_
   < 1 > -> 1

Let's evaluate the SQL query.  The COUNT(*) of S is 1, so we find all the rows of R where B = 1 (just the first one), and sum up their A columns for a total result of 1.

Now what happens if we add the tuple <1> to S?  Well, the value of the nested aggregate changes from 1 to 2, so now the query result is based on a completely different set of rows (in this case summing to 3).  The delta isn't just a simple addition; we need to delete the existing value (-1), and then add in an entirely new and unrelated value (+3).  

Put another way, conditionals are different.  They're not straight arithmetic, they actually trigger a different control flow.  This is part of why AGCA restricts conditionals to having only arithmetic expressions in them.  Yet, we still need a way to express these changes in control flow.  Lifts give us an ideal tool for this.  We can express the above query as:

AggSum([], (B ^= AggSum([], S(C))) * R(A,B) * A)

Read the first term of this expression as "Compute COUNT(*) of S and assign it to B."  

I originally said that the delta of a (simple) Lift was 0.  This is not true in the general case.  In particular, note that so far we've only been looking at lifts where the value being assigned is computed from a simple arithmetic expression.  As we've already covered, the delta of such an expression is always 0.  But what happens when you lift an expression that has a nonzero delta?  For example, what is the delta (with respect to an insertion into S) of:

(B ^= AggSum([], S(C)))

Let's consider this in terms of the example data above.  The initial aggregate value is 1, so the table for this expression would be

___B______#_
 < 1 > -> 1

After we add a tuple to S, the table becomes

___B______#_
 < 2 > -> 1

We can only do arithmetic on the multiplicity column; we can't just add 1 to B (remember, this is supposed to represent a control flow decision).  So... we actually have to delete the old tuple and put in the new one.  In other words, the value computed by delta for this expression should be

___B_______#_
 < 1 > -> -1
 < 2 > ->  1

The full delta rule for lifts reflects this insert/delete pair:

∂(X ^= A) = (X ^= A + ∂A) - (X ^= A)

If you're paying attention, you should notice something horribly wrong with this.  More on that next week.