The UBDB Seminar - Spring 2016

The UBDB seminar meets on Mondays at TBD, typically in TBD. Subscribe to cse-database-list for more details.

Query Explanations: A New Approach to Understanding Big Data
Sudeepa Roy
Feb. 3; 10:00 AM
Davis 338A

Abstract

In recent years, the availability of big data has resulted in a growing number of users who are interested in interpreting the trends and anomalies for large datasets. This presents an imminent requirement of sophisticated data analysis tools that can provide qualitative information based on query answers on such datasets. In this talk, I will describe my current research on developing a principled framework for explaining query answers inspired by the theory of causality and intervention from the area of Artificial Intelligence. I will present our solutions to core challenges in this task such as obtaining concise descriptions of explanations, handling inherent dependencies of database tuples, and achieving real-time efficiency in large explanation spaces. I will conclude the talk with several exciting future research directions spanning database theory and systems, algorithms, and user interactions with a graphical interface.

Bio

Sudeepa Roy is an Assistant Professor in Computer Science at Duke University since Fall 2015. She works in the area of databases and data management, with a focus on foundational aspects of big data analysis, which includes causality and explanations for big data, data provenance, probabilistic databases, and applications of database techniques in other domains. Prior to Duke, she did a postdoc at the University of Washington, and obtained her Ph.D. from the University of Pennsylvania. She is a recipient of the NSF CAREER award and a Google PhD Fellowship.
Scalable Platforms for Lifecycle Management of Collaborative Data Science Workflows
Amol Deshpande
April 6; Time 11:30 (Lunch Talk)
Davis 113A

Abstract

For several decades now, the amount of data available to us has been growing at a pace far higher than our ability to process it; this trend has accelerated many-fold in recent years with the emergence of efficient and mass-produced scientific instruments, increasing ease of generating and publishing data, and proliferation of Internet-connected devices. In this talk, I will present an overview of our ongoing work on building a platform for enabling collaborative data science, where teams of data scientists can simultaneously analyze, modify, and share datasets, to understand trends and to extract actionable insights. While numerous solutions exist for specific data analysis tasks, underlying infrastructure and data management capabilities for supporting ad hoc collaboration pipelines are still largely missing.  I will present our vision for a unified, dataset-centric platform for addressing these challenges, and present our recent work on: (a) efficiently managing a large number versioned datasets, (b) designing and supporting a unified query language to seamlessly query versioning and provenance information, and (c) lifecycle management of complex machine learning models like deep neural networks.

Bio

Amol Deshpande is a Professor in the Department of Computer Science at the University of Maryland with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from University of California at Berkeley in 2004.  His research interests include uncertain data management, adaptive query processing, data streams, graph analytics, and sensor networks. He is a recipient of an NSF Career award, and has received best paper awards at the VLDB 2004, EWSN 2008, and VLDB 2009 conferences.
Software Synthesis for Networks
Hossein Hojjat
May 25; 2:30 PM
Davis 113A

Abstract

Although Software-Defined Networking (SDN) makes it possible to build rich applications in software, programmers nowadays are forced to deal with numerous low-level details such as debugging a network configuration that has a bug. Most existing approaches focus on diagnosis of problems in networks. They can detect a bug in a configuration (e.g. existence of a path to undesired entities) but they fail to offer repairs to bring the network back to safety. This talk will present highlights from our recent work using automated software repair to efficiently find a bug in a network and to suggest optimal repairs. In the first half of the talk I will discuss how several various software verification problems and properties of interest can be modelled directly using Horn clauses. In the second half I will discuss a technique that uses our Horn clause solving techniques to help network operators fix buggy configurations. Our approach is guaranteed to find the best repairs by constructing an optimization lattice representing the space of possible repairs, and uses a novel local search technique to find the best solutions. (Joint work with Nate Foster (Cornell University), Pavol Cerny (University of Colorado at Boulder), Jedidiah McClurg (University of Colorado at Boulder), Philipp Ruemmer (Uppsala University))

Bio

Hossein Hojjat is an assistant professor in the Computer Science department at the Rochester Institute of Technology (RIT). Before joining RIT, he was a postdoctoral researcher at Cornell University. He earned a PhD in Computer Science from EPFL in 2013. His research interests center on program synthesis and computer-aided verification.
Software Development as a Writing Seminar
Walker White
June 16; 3:00 PM
Davis 113A

Abstract

The game design courses at Cornell put students together in interdiscplinary teams of software developers, artists, and other domain experts to produce a shippable game. As part of this process, the students develop professional skills such as writing and presenting for various audiences, the development and maintenance of highly functional teams, and proper project management. As a result, these courses are highly regarded by employers, even those outside of the games industry.
We have found that core feature for developing student professional skills is an intense cycle of documentation. Cooperating with the Engineering Communications program, we have structured these courses as a writing seminar. While the students develop their games, they also produce multiple design-related documents. Furthermore, they revise these documents multiple times in response to instructor feedback.
In this talk, we provide an overview of this documentation process and show how it works to strengthen student teams and professional skills. This will include our design of the documents themselves, as we have to balance deep planning with agile development during these fast-moving courses. We also discuss the importance of shifting the document audience -- from developer, to designer, to investor, to player -- in helping the students communicate between the various disciplines on their team. Finally, we talk about our assessment process, and how we work together with the Engineering Communication Program to provide the students with effective feedback.

Bio

Walker White is a Senior Lecturer and Stephen H. Weiss Teaching Fellow in the Department of Computer Science at Cornell University. Since 2007 he has been the Director of the Game Design Initiative at Cornell, which supports an interdisciplinary minor in game design and development. He has won several teaching awards for his work in this program.