The ODIn Lab - DBToaster and the Viewlet Transform

A big issue these days is large, rapidly changing data. Users often need to keep a close eye on this data. Algorithmic trading, scientific computing, network monitoring, and even things like data warehousing are all examples of areas that have lots of data, and that need to react very quickly to certain (potentially complex) conditions in that data.

The overarching goal of the DBToaster project is to produce a tool chain capable of effectively performing these monitoring tasks. Our latest paper (to be presented at VLDB 2012 this August) discusses one of the core ideas behind our approach: exploiting incrementality (through something we call the Viewlet Transform).

To get at the basic idea across, let me use a common task as an analogy: the monthly report. If your'e in a relatively stable business, the content of the report will probably be mostly the same from month to month. Instead of rewriting the report from month-to-month, you might just take last month's report and update it with any new facts, figures, and other changes in the past month (in fact, your boss might be interested in only the changes… but that's getting a little off-track). This still requires a lot of work. If the report has a lot of figures, you wont re-create the figures from scratch either. Instead, you'll probably have a spreadsheet (also from last month) that you can just punch the new numbers into.

Loosely speaking, you have a repeating task (writing your report) that is easier to perform if you only have to figure out what changed (the figures) since the last time you did it. This idea has been around for a long time in databases (since the 80s at least) in the form of something called Incremental View Maintenance (or IVM for short). Let's say you have a query that you want to repeatedly evaluate. If you're smart, you'll just evaluate the query result once and save the result for the next time you need the answer.

But of course, the data you're querying might change in the meantime. The core idea behind IVM is that you can evaluate what's known as a Delta Query, which is a simpler form of the original query. Instead of giving you the full query answer, it looks at the changes to the input data, and tells you what changes in the query results. This delta query is usually simpler and faster to evaluate than the original query, but can still be pretty slow (especially if the original query is a biggie), making IVM a poor choice for many realtime applications.

Let's go back to our example of the monthly report that you update each month. Even though this is faster than creating the report from scratch, you still have to update all the figures as well. Of course, you don't create the figures from scratch either. If you're smart, you'll just edit last month's spreadsheets. The viewlet transform is based on exactly the same idea. The delta query is a query that you evaluate over and over and over again. We figure, why not just evaluate it once and then just update it when the data changes.

So now you have your original query, and some delta queries. Instead of re-evaluating the delta queries every time your data changes, you evaluate the delta query once and store the result. Now, whenever your data changes, you only need to read the delta query result out of storage, instead of doing any expensive computations. Of course, now you need to keep your delta queries up to date as well. You do this by using delta queries of each of the delta queries (A "second-order delta"). This process continues (giving you third, fourth, fifth, etc… order deltas), with each delta query becoming progressively simpler than the last. Eventually you reach a point where the delta query is incredibly simple, and you stop.

You might have a lot of these queries sitting around and needing to be kept up-to-date, but each of them reduces the cost of maintaining another query by enough that it's incredibly worth it. Combined with other techniques, we've gotten a typical performance improvement of 3-4 orders of magnitude over several commercial data management systems.

And that's the basic idea of the viewlet transform. More soon.

-Oliver