The ODIn Lab - Expressiveness vs Efficiency

There's an odd dichotomy that hit me recently. On the one hand, there's been a big recent push towards DSLs, or domain specific languages. Examples include Bloom (distributed computation for monotonic programs), the many DSLs implemented in Delite (which include things like matrix computations, ML algorithms, etc…), GraphLab, GraphChi, and so forth.

On the other hand, people continue to want more expressive languages. We keep adding more features to things like SQL (which has been turing-complete for the last few years). I understand this drive. We want to be able to efficiently capture more ideas. This idea of abstracting concepts is what computer science is all about.

As I was discussing the design of an indexing data-structure with one of my students the other day, the weight of dichotomy really hit me. We were discussing building more and more corner cases into the data-structure (or rather into objects that we were indexing). This struck me as a bad idea, since I really hate corner cases. On the other hand, a critical feature of the indexing data-structure was the ability to perform set-containment on the objects we were indexing.

As many of you know, if you allow a set description language to get too complex, set containment can easily work its way into NP or even intractability. So there it was: a conundrum. On the one hand, a complex language would give us more flexibility, and on the other, if we made it too complex, using the indexing structure would cost more time than it saved.

That got me thinking. Many problems that are intractable on a turing-complete language become feasible on certain well-defined subsets of the language. In fact, they may even be tractable on multiple well-defined subsets, potentially multiple non-overlapping subsets.

And that's where DSLs come in. A DSL allows you to specify a restricted form of a language that's far more amenable to optimization, analysis, and other useful features than a fully general language like C, Java, Python or Ruby.

Often, the DSL doesn't even need to live outside the confines of a general language. Bloom has a Ruby-based implementation that exposes the full (turing-complete) power of Ruby for those program fragments that can't be easily expressed in Bloom's framework. Scala has a Sql compatibility layer that transforms a specific fragment of Scala into equivalent relational operators (similar to VC++'s Linq, but more tightly coupled with the language).

This… this is super cool, because it suggests that different DSLs can live and cooperate in the same language (you see some of this in the Delite framework already). It also suggests that certain fragments of the language might translate naturally into a corresponding DSL's infrastructure. Why is this cool? Because it means you might be able to get the best of both worlds -- expressiveness and efficiency.

Imagine a language that could automatically analyze your program to identify the specific language fragment best suited to encoding it. Although there might be some cost-estimation factors to help decide between multiple different language fragments, this actually seems like it might be doable with pure static analysis. Such an analysis tool might also be able to identify trouble spots in your program -- point to specific operations that prevent it from descending into a specific program fragment.

Just a thought.