The ODIn Lab - Semantics as Data

Something I've been getting drawn to more and more is the idea of computation as data.

This is one of the core precepts in PL and computation: any sort of computation can be encoded as data. Yet, this doesn't fully capture the essence of what I've been seeing. Sure you can encode computation as data, but then what do you do with it? How do you make use of the fact that semantics can be encoded?

Let's take this question from another perspective. In Databases, we're used to imposing semantics on data. Data has meaning because we chose to give it meaning. The number 100,000 is meaningless, until I tell you that it's the average salary of an employee at BigCorporateCo. Nevertheless, we can still ask questions in the abstract. Whatever semantics you use, 100,000 < 120,000. We can create abstractions (query languages) that allow us to ask questions about data, regardless of their semantics.

By comparison, an encoded computation carries its own semantics. This makes it harder to analyze, as the nature of those semantics is limited only by the type of encoding used to store the computation. But this doesn't stop us from asking questions about the computation.

The Computation's Effects

The simplest thing we can do is to ask a question about what it will compute. These questions span the range from the trivial to the typically intractable. For example, we can ask about…

… what the computation will produce given a specific input, or a specific set of inputs.
… what inputs will produce a given (range of) output(s).
… whether a particular output is possible.
… whether two computations are equivalent.

One particularly fun example in this space is Oracle's Expression type [1]. An Expression stores (as a datatype) an arbitrary boolean expression with variables. The result of evaluating this expression on a given valuation of the variables can be injected into the WHERE clause of any SELECT statement. Notably, Expression objects can be indexed based on variable valuations. Given 3 such expressions: (A = 3), (A = 5), (A = 7), we can build an index to identify which expressions are satisfied for a particular valuation of A.

I find this beyond cool. Not only can Expression objects themselves be queried, it's actually possible to build index structures to accelerate those queries.

Those familiar with probabilistic databases will note some convenient parallels between the expression type and Condition Columns used in C-Tables. Indeed, the concepts are almost identical. A C-Table encodes the semantics of the queries that went into its construction. When we compute a confidence in a C-Table (or row), what we're effectively asking about is the fraction of the input space that the C-Table (row) produces an output for.

Inter-Computation Relationships

Another class of questions is how different computations, or computation fragments relate or interact. For example, we can ask about…

… what the algebraic properties of a computation are (i.e., do two computations commute)
… what the dependencies of a computation are.
… given a sequence of computations, what does the information flow graph look like
… given a sequence of computations, does a specific pattern exist, and if so on which computation fragments?

This is an area that has not been explored quite as extensively. Distributed computing has looked long and hard at some of these questions (i.e., when do operations commute), but almost always in a specific context. Probably the closest idea, spiritually, appears in systems like Delite [2]. These sorts of compiler generation tools allow users to establish semantic restrictions on a domain specific language that lead to powerful optimizations. In a sense, these kinds of queries regarding computation interactions are also a form of optimization... but more general.

Combining Computations

Ultimately, one of the biggest distinctions between computation and normal data, is that it's possible to easily combine computation. Computation representations such as Monads are explicitly designed for this, but even simple iterative programs can still be concatenated. Computations can be broken apart, stitched together, sliced, diced, and sorted every which way... and the result of each is still more computation.

Summary

Where is this leading? Nowhere specific. We have a variety of tools and techniques for expressing computation, and now we need some tricks and techniques for effectively querying them as well.

References

[1] Gawlick, D. et al. Applications for expression data in relational database systems. 609–620.

[2] Chafi, H. et al. 2010. Language virtualization for heterogeneous parallel computing. ACM Sigplan …. (2010).