Query Explanations: A New Approach to Understanding Big Data
Feb. 3; 10:00 AM
In recent years, the availability of big data has resulted in a growing
number of users who are interested in interpreting the trends and
anomalies for large datasets. This presents an imminent requirement of
sophisticated data analysis tools that can provide qualitative
information based on query answers on such datasets. In this talk, I
will describe my current research on developing a principled framework
for explaining query answers inspired by the theory of causality and
intervention from the area of Artificial Intelligence. I will present
our solutions to core challenges in this task such as obtaining concise
descriptions of explanations, handling inherent dependencies of database
tuples, and achieving real-time efficiency in large explanation spaces.
I will conclude the talk with several exciting future research
directions spanning database theory and systems, algorithms, and user
interactions with a graphical interface.
Sudeepa Roy is an Assistant Professor in Computer Science at Duke
University since Fall 2015. She works in the area of databases and data
management, with a focus on foundational aspects of big data analysis,
which includes causality and explanations for big data, data provenance,
probabilistic databases, and applications of database techniques in
other domains. Prior to Duke, she did a postdoc at the University of
Washington, and obtained her Ph.D. from the University of Pennsylvania.
She is a recipient of the NSF CAREER award and a Google PhD Fellowship.
Scalable Platforms for Lifecycle Management of Collaborative Data Science Workflows
April 6; Time 11:30 (Lunch Talk)
For several decades now, the amount of data available to us has been growing at a pace far higher than our ability to process it; this trend has accelerated many-fold in recent years with the emergence of efficient and mass-produced scientific instruments, increasing ease of generating and publishing data, and proliferation of Internet-connected devices. In this talk, I will present an overview of our ongoing work on building a platform for enabling collaborative data science, where teams of data scientists can simultaneously analyze, modify, and share datasets, to understand trends and to extract actionable insights. While numerous solutions exist for specific data analysis tasks, underlying infrastructure and data management capabilities for supporting ad hoc collaboration pipelines are still largely missing. I will present our vision for a unified, dataset-centric platform for addressing these challenges, and present our recent work on: (a) efficiently managing a large number versioned datasets, (b) designing and supporting a unified query language to seamlessly query versioning and provenance information, and (c) lifecycle management of complex machine learning models like deep neural networks.
Amol Deshpande is a Professor in the Department of Computer Science at the University of Maryland with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from University of California at Berkeley in 2004. His research interests include uncertain data management, adaptive query processing, data streams, graph analytics, and sensor networks. He is a recipient of an NSF Career award, and has received best paper awards at the VLDB 2004, EWSN 2008, and VLDB 2009 conferences.
Software Synthesis for Networks
May 25; 2:30 PM
Although Software-Defined Networking (SDN) makes it possible to build rich applications in software, programmers nowadays are forced to deal with numerous low-level details such as debugging a network configuration that has a bug.
Most existing approaches focus on diagnosis of problems in networks. They can detect a bug in a configuration (e.g. existence of a path to undesired entities) but they fail to offer repairs to bring the network back to safety.
This talk will present highlights from our recent work using automated software repair to efficiently find a bug in a network and to suggest optimal repairs. In the first half of the talk I will discuss how several various software verification problems and properties of interest can be modelled directly using Horn clauses. In the second half I will discuss a technique that uses our Horn clause solving techniques to help network operators fix buggy configurations. Our approach is guaranteed to find the best repairs by constructing an optimization lattice representing the space of possible repairs, and uses a novel local search technique to find the best solutions.
(Joint work with Nate Foster (Cornell University), Pavol Cerny (University of Colorado at Boulder), Jedidiah McClurg (University of Colorado at Boulder), Philipp Ruemmer (Uppsala University))
Hossein Hojjat is an assistant professor in the Computer Science department at the Rochester Institute of Technology (RIT). Before joining RIT, he was a postdoctoral researcher at Cornell University. He earned a PhD in Computer Science from EPFL in 2013. His research interests center on program synthesis and computer-aided verification.
Software Development as a Writing Seminar
June 16; 3:00 PM
The game design courses at Cornell put students together in interdiscplinary teams
of software developers, artists, and other domain experts to produce a shippable game.
As part of this process, the students develop professional skills such as writing and
presenting for various audiences, the development and maintenance of highly functional
teams, and proper project management. As a result, these courses are highly regarded by
employers, even those outside of the games industry.
We have found that core feature for developing student professional skills is an intense
cycle of documentation. Cooperating with the Engineering Communications program, we have
structured these courses as a writing seminar. While the students develop their games,
they also produce multiple design-related documents. Furthermore, they revise these
documents multiple times in response to instructor feedback.
In this talk, we provide an overview of this documentation process and show how it
works to strengthen student teams and professional skills. This will include our design
of the documents themselves, as we have to balance deep planning with agile development
during these fast-moving courses. We also discuss the importance of shifting the document
audience -- from developer, to designer, to investor, to player -- in helping the students
communicate between the various disciplines on their team. Finally, we talk about our
assessment process, and how we work together with the Engineering Communication Program
to provide the students with effective feedback.
Walker White is a Senior Lecturer and Stephen H. Weiss Teaching Fellow in the Department
of Computer Science at Cornell University. Since 2007 he has been the Director of the
Game Design Initiative at Cornell, which supports an interdisciplinary minor in game
design and development. He has won several teaching awards for his work in this program.