The ODIn Lab - Consistency through semantics

When designing a distributed systems, one of the first questions anyone asks is what kind of consistency model to use. This is a fairly nuanced question, as there isn't really one right answer. Do you enforce strong consistency and accept the resulting latency and communication overhead? Do you use locking, and accept the resulting throughput limitations? Or do you just give up and use eventual consistency and accept that sometimes you'll end up with results that are just a little bit out of sync.

It's this last bit that I'd like to chat about today, because it's actually quite common in a large number of applications. This model is present in everything from user-facing applications like Dropbox to SVN/GIT, to back-end infrastructure systems like Amazon's Dynamo and Yahoo's PNUTs. Often, especially in non-critical applications latency and throughput are more important than dealing with the possibility that two simultaneous updates will conflict.

So what happens when this dreadful possibility does come to pass? Clearly the system can't grind to a halt, and often just randomly discarding one of these updates is the wrong thing to do. So what happens? The answer is common across most of these systems: They punt to the user.

Intuitively, this is the right thing to do. The user sees the big picture. The user knows best how to combine these operations. The user knows what to do, so on those rare occurrences where the system can't handle it, the user can.

But why is this the right thing to do? What does the user have that the infrastructure doesn't?

The answer is Semantics.

Each update does something with the data. It increments, it multiplies, it derives, it computes. It produces some new value of the data. It has specific semantics, and the systems I enumerated above (and those like them) make no effort to try to understand those semantics. I addressed this in part already, when I discussed intent vs effect a few weeks ago.

The user, conversely, does understand the semantics of an application. Given two updated values (and a suitable visualization tool, like diff), a user can usually infer the intent of the updates and merge their effects appropriately.

Sometimes this is the best way to do things. When writing source code or other text, where the user is directly modifying the files (i..e, the system never receives a representation of the intent in the first place), the overhead of manually merging periodically is typically lower than the overhead of having to encode edits in terms of intent (though this might be interesting if combined with bug/feature tracking systems).

Conversely, if an application is interacting with the data directly, the application can provide tools for resolution. This is indeed the case in Dynamo, where the application provides a merge function for resolving inconsistent updates. But this is only the first step. What can you do to avoid creating two inconsistent versions of the data in the first place? How do you infer the user/application's intent, while minimizing the burden of declaration placed the user/app developer.

In short, what can you do to both detect and leverage an application's semantics to help the application stay consistent?