CIDR is a bi-anual conference focusing on new ideas and directions for the database community. Topics presented range from early and mid stage systems efforts, to proposals for radical changes in the direction of database research. A focus of CIDR is the Gong Show, a sequence of 5-minute talks about literally anything.
One theme running through this year's CIDR was a recognition of the scale and scope of database research and technology moving further and further away from common use cases. This idea was especially evident in Jens Dittrich's Gong Show Talk "The Case for Small Data Management", where he argued that the number of organizations actually dealing with petabytes of data in practice was incredibly small and that our efforts would have the biggest impact when targeted at realistic data sizes. Brown mirrored this vision in Tupleware, noting that increasingly the limiting factor for most small-scale users was computational complexity and expressiveness rather than data sizes.
There was a significant focus on areas where HCI and Databases could unify their efforts. Trifacta presented on some work on using predictive modeling to simplify data transformation development, and Google presented their efforts to simplify data integration.
As always, abstractions for data management were quite popular, and we saw even more abstractions that treat probabilistic models as views. There was even an entire panel on managing and querying knowledge.
Reabsorption of Specialized DBs
A similar apparent trend was the observation that specialized database systems were no longer needed. To paraphrase one attendee, we've realized that working with graph data is basically doing lots of self-joins and recursive queries, and realizing that, we can optimize general-purpose database engines to be just as good. This view manifested in several ways: EPFL's Vida was one of several efforts to create an overlay on top of specialized database systems, creating an abstract, uniform view of the data. Wisconsin made a case against specialized graph engines, and Oracle presented their approach to dynamically indexing semistructured data.
Data 'Swamps' and Low-Quality Data
One subject that triggered quite a bit of discussion with the audience was the growing need to manage low-quality data. The term "Data Lake" was particularly abrasive, as numerous attendees pointed out that without curation, a data lake can quickly turn into a data swamp. Numerous efforts to improve this curation process were presented.
A few other directions stuck out. Super-aggressive, bare-metal query compilation to raw hardware is becoming even more of a thing, and I noticed an increased interest in database security, access control, and trust.