Some great news for the Mimir project. After picking up a massive $2.7m grant this summer (in collaboration with NYU and IIT) to build an interactive data curation system, we just got notified of two new paper accepts.
Schemas are useful. They give you a common language to use when talking to your database, and they help you from doing dumb things like putting data into the wrong column. Unfortunately, they're also hard... so many people avoid using them. In collaboration with IIT and Oracle, William Spoth and Ying Yang outlined a system for dynamically generating schemas from semi-structured data, and allowing systems to flexibly evolve schemas over time.
Inference in graphical models requires a lot of hand tuning. Approximation algorithms are fast, but imprecise. Exact algorithms work well, until they don't. In this paper, Ying Yang described a new ``Leaky Join'' operator that allows for convergent-online inference. In short, a query plan consisting of leaky join operators behaves like an online algorithm in that it produces (high quality) approximations prior to completion. However, unlike classical online algorithms, it is guaranteed to converge with only minimal overhead compared to a standard classical join.