The UB Interlocked logo with two nondescript birds perched on it.  The birds are symbolic of Odin's raven's Hugin and Munin

The Online Data Interactions Lab @ UB is about data. Databases, query compilers, data structures, algorithms, systems, implementation, and visualization. We’re interested in everything to do with storing, retrieving, and interpreting data. Our ultimate goal is to make it easier to build and understand data-driven systems.

Compilers

A modern compiler is a microcosm of database problems, from complex queries over hierarchical state, to incrementally maintained views. We explore ways to apply database tools and techniques to enable the creation of "declarative" compilers that can simultaneously be fast, powerful, scalable, and maintainable.

Workflows

Data preparation is a full time job, requiring understanding the full lifecycle of many datasets. We explore the use of techniques like provenance, uncertain data, and query compilation to make it easier to debug workflows, to manage data governance, and to safely re-use past efforts on new data.

See all of our projects


IVM Ordering Constraints Aren't Transitive

This has been simmering for a few months now, but there's a really weird phenomenon that Victoria and I noticed about ordering constraints in distributed IVM that deserves to be mentioned somewhere (whilst we plot a full publication). In a distributed IVM maintenance system, "happens before" constraints are not transitive. A slightly less bombastic way of phrasing this might be that in a distributed IVM system, all transitive "happens before" constraints are exposed as pairwise constraints.

Transactions

Let's start by being a bit clearer about what problem we're talking about. When reasoning about distributed transaction processing, it's customary to abstract away the precise nature of the data. Instead of talking about tables, rows, variables, etc..., you simply think of abstract values ($A$, $B$, $C$, etc...). Similarly, each transaction $T$ represents some abstract computation; all we know about it is the set of values it reads (the "read set" $R(T)$) and the set of values it writes (the "write set" $W(T)$). Values are versioned, typically with some timestamp assigned to the transaction. So if $A$ is in $T_{42}$s write set, after $T_{42}$ finishes, we have $A$ version 42, or $A_{42}$. If $A$ isn't in $T_{42}$s write set, then it inherits its value from the prior version (i.e., $A_{42} = A_{41}$).

Typically, we think of transactions as running sequentially. Given some collection of transactions $T_1, \ldots, T_N$, we execute them one at a time. I'll write this as $$(A_{i+1}, B_{i+1}, \ldots) = T_{i+1}(A_i, B_i, \ldots)$$ Of course this is boring, and not very distributed... but it gives us a baseline for correctness that we call a serial ordering. We have the flexibility to execute the transactions however we want, as long as we can guarantee that we'll end up with the same final state $A_N, B_N, \ldots$.

read more →

Clangd

read more →

Fixing Anubis

read more →