The Viewlet Transform (Part 1: Deltas in AGCA)

Over the last few weeks, I've been covering various aspects of AGCA, the language for incremental processing behind DBToaster.  Now, I'm going to chat a bit about the heart of DBToaster: the viewlet transform.  

The basic idea behind viewlet transform is actually something that's been around for a very long time: delta queries, commonly used for Incremental View Maintenance.  Let's say you have a query Q. If you need to evaluate Q over and over again, it usually makes sense to evaluate it once and just store the results somewhere.  

That's great, but if the data that goes into Q changes, you need to update the stored results accordingly.  However, instead of re-evaluating the entire query from scratch, you can compute what's known as a delta query.  The Delta of Q (with respect to table T) is a simplified form of Q that tells you how the results of Q will change (when you apply some change ∂T to table T).  Put algebraically:

Q(T) = Q(T+∂T) + ∂Q(T, ∂T)

The idea is that computing ∂Q is more efficient, and generally faster than computing Q from scratch.  

Let's have a look at how AGCA interacts with this delta operation.  Recall that everything in AGCA is multiplicities.  We just need to figure out which rows change, and what the change in their multiplicities will be.  To start, we're going to work with updates to one table at a time, and updates to one row of that table at a time (note that this may still result in multiple rows changing in the result, but the input data will only change by one row at a time).  

Deltas of Tables

Concretely, what happens to the results of an AGCA query when we insert a single row, with values <X, Y, Z, …> into table T?

If the table is the one being updated, then the change is just the single row being inserted.  We can construct a singleton rows in AGCA by using the lift operation, just like I described last week.

∂T(A,B,C,…) = (A ^= X) * (B ^= Y) * (C ^= Z) * …

If the table isn't the one being updated, then there is no change at all.  In other words, we need a special table with every single row having a multiplicity of 0.  We know how to do that.

∂S(…) = {0}

The delta for a deletion is similar.  Again, remember that AGCA uses multiplicities for everything.  If we're deleting a row from table T, we want to reduce the multiplicity of that row by 1.  How do we do this?  We add a negative multiplicity

∂T(A,B,C,…) = {-1} * (A ^= X) * (B ^= Y) * (C ^= Z) * …

Deltas of Bag Unions

What about bag unions?  What if we have an expression like 

Q = Q1 + Q2

Well, let's assume that we can figure out an expression for computing the deltas of Q1 and Q2.  Then we know that 

Q + ∂Q = (Q1 + ∂Q1) + (Q2 + ∂Q2)

AGCA is a ring, so the normal rules (distributivity, associativity, commutativity, etc…) for + and * apply to bag union and natural join as well.  So, reshuffling a bit, we get

Q + ∂Q = (Q1 + Q2) + (∂Q1 + ∂Q2) = Q + (∂Q1 + ∂Q2)

In other words

∂(Q1 + Q2) = ∂Q1 + ∂Q2

Deltas of Natural Joins

We can do something similar for natural joins.

Q + ∂Q = (Q1 + ∂Q1) * (Q2 + ∂Q2) = (Q1*Q2) + (Q1*∂Q2) + (∂Q1*Q2) + (∂Q1*∂Q2)

So.

∂Q = (Q1*∂Q2) + (∂Q1*Q2) + (∂Q1*∂Q2)

This one's a bit stranger, so let's look at it in a bit more detail.  If only Q1 changes (but not Q2), then ∂Q2 = {0}.  So…

∂Q = (Q1*{0}) + (∂Q1 * Q2) + (∂Q1*{0})

Again, we benefit from AGCA being a ring.  {0} is AGCA's additive identity (a fancy way of saying that {0} + X = X, and also that {0} * X = {0}).  So...

∂Q = {0} + (∂Q1 * Q2) + {0} = ∂Q1 * Q2

This kinda makes sense.  A join takes every row of one table and matches it against every row of the other table.  If you insert a row into one of the two tables (for example, inserting ∂Q1 into Q1), the change to the final result comes from joining it against every row of the other table (Q2 in this case).  The exact same thing happens if you only insert into Q2, but not Q1.

So what if both Q1 and Q2 change.  For example, if query could be

Q = T(A,B) * T(B,C) 

This is also known as a self-join.  If we insert a row into T, then there'll be three parts to the update:

  1. The inserted row joined against all of the T(B,C)s (∂T(A,B) * T(B,C))
  2. The inserted row joined against all of the T(A,B)s (T(A,B) * ∂T(B,C))
  3. And there's a possibility that the inserted row might join against itself. (∂T(A,B) * ∂T(B,C))

Deltas of Special Tables

The delta of one of these special tables we use for constants, numerical formulas, or comparisons is always {0}.  This may seem a little unintuitive, but it actually makes sense.  Let's say we have the query 

Q = T(A) * {A^2}

If we insert a row <3> into A, that's going to change T, but it won't change the fact that (3^2 = 9).  The row <3>->9 is always present in the special table {A^2}, regardless of whether or not it's in T.  So we'd get

∂Q = {0} + (∂T(A) * {A^2}) + {0} = ∂T(A) * {A^2}

Which is precisely correct.

Deltas of AggSums

The delta of an AggSum is the AggSum of the delta.

∂AggSum([…], Q) = AggSum([…], ∂Q)

The reasoning behind this is identical to the reasoning behind the delta of bag union (+), since AggSum uses precisely the same mechanics.

In Summary

I'm going to punt on the delta of a Lift expression for now.  There's a bit of hidden complexity there that I'll return to in two weeks.  For now, the simplified form of the Lift operation that I've described so far always has a delta of {0}.

So the full list of delta rules (minus the lift) for the delta of an update to T is

∂T(A,B,C,…)       = {+/- 1} * (A ^= X) * (A ^= Y) * (A ^= Z) * …
∂S(…)             = {0}
∂(Q1 + Q2)        = ∂Q1 + ∂Q2
∂(Q1 * Q2)        = (Q1 * ∂Q2) + (∂Q1 * Q2) + (∂Q1 * ∂Q2)
∂({…})            = {0}
∂(AggSum([…], Q)) = AggSum([…], ∂Q)

Note two things about this.  First, the delta of a query expressed in AGCA is itself a query in AGCA.  This is a huge deal. Prior to AGCA, delta queries had special funny business that required special logic in the database to process.  In other words, if you don't need any special query-processing infrastructure to process the delta of an AGCA query, you just need support for AGCA itself.

The second distinction is that the delta query is simpler.  Roughly speaking, every time you take the delta of a query in AGCA, you remove one table from each join.  In other words, if you have an AGCA query, and you take its delta, and the delta of that, and so forth, eventually you'll end up with {0}.  

These two facts are critical to the Viewlet Transform, which I'll finally get to next week.


This page last updated 2019-06-28 15:47:51 -0400