The ODIn Lab - Laasie: Building the next generation of collaborative applications

With the first Laasie paper (ever) being presented tomorrow at WebDB (part of SIGMOD), I thought it might be a good idea to explain the hubbub. What is Laasie?

The short version is that it's an incremental state replication and persistence infrastructure, targeted mostly at web applications. In particular, we're focusing on a class of collaborative applications, where multiple users interact with the same application state simultaneously. A commonly known instance of such applications is the Google Docs office suite. Multiple users viewing the same document can simultaneously both view and edit the document.

For Developers

The goal of Laasie is to provide an infrastructure on which the next generation of collaborative applications can be built. For developers, this means that the infrastructure should fade into the background. The entire development process should proceed (almost) as if one were writing a single-site application. To use the MVC paradigm as a basis, Laasie acts as the M(odel), persisting your data and making sure each client has a shared view of it, and making sure that clients can revive themselves after the fact.

Not only does Laasie make it easier for you to get your collaborative application off the ground, it also provides a range of useful features. In addition to some fun access control, sanity checking, and sandboxing capabilities, our eventual goal will be to provide support for distributed Laasie instances. End users requiring offline support, added privacy, or similar features will be able to instantiate their own Laasie instances, which will "just work" with your application.

For Researchers

The primary challenge of providing such an infrastructure is the question of how we represent state updates. The more general you get, the harder it is to be efficient.

To wit, we could transfer the full state on every single update (this is roughly what Dropbox does). This is certainly quite general, and allows us to express any sort of state change that we like. On the other hand, it's a bit hard to implement efficiently. This is why you don't see many distributed applications that use Dropbox for this purpose (as a shared filesystem perhaps, but not for low-latency sharing).

At the other end of the spectrum, there are a whole range of optimizations you can implement. Knowing that two operations are commutative (or that there's an applicable operational transform) creates a simpler, leaner, more efficient consistency model. Being able to subdivide an application's state allows client instances to pull only relevant data, or changes to fragments of the state. Bulk changes to structured data (numbers, collections, matrices, images) can often be transmitted more efficiently as a description of the change (add 1 to every number in this collection). You could create an infrastructure that was super-optimized and tailored specifically to your application. Unfortunately, then you've tied the infrastructure to your application's semantics. If those semantics change (e.g., you add features), you need to change the infrastructure.

The core insight of Laasie is that functions (aka procedures, aka monads, etc...) are a way of representing state updates that is both general (not turing complete yet, but we're getting there), but still amenable to optimization. Because the full application semantics are expressed in the update, it is possible to analyze each update, assert properties about updates, and more generally, to restructure and optimize the overall state representation.

More on this next week, when I introduce the Log as a Service state representation.