We're thrilled to invite you to the third anual Comp. Sci. & Eng. Fall Demo Day. Student groups from several CSE capstone classes will be presenting the culmination of 3-months of effort, hard work, (metaphorical) blood, sweat (well... caffein really), and tears (see above).
This year's participating classes and projects include:
Languages and Runtimes for Big Data (CSE 662)
Vanir: Probabilistic Databases on Spark Nicholas Cellino, William Spoth
Most data is unusable when directly collected, due to either user or system error, to convert this messy data into us- able data it takes time, money, and Advil. Probabilistic databases aim to streamline this cleaning process but often have a large computation or data cost associated. Our solution is to leverage the distributed computing engine Apache Spark, to mitigate the overhead associated with probabilistic databases.
Optimizer for Sampling Queries Vandit Aruldas, Shivang Aggarwal, Sneha Krishnamurthy, Rakshit Muthappa Padetira
Probabilistic database systems aim to produce all possible results. Query Sampling generates samples of possible results. The database is split into a fixed number (N) of possible worlds and the query is run on all N possible worlds in parallel by representing data in one of several ways. Using a cost model to predict runtimes allows us to decide on the fastest samplingstrategy.
Just-in-Time Datastructure Modeling Darshana Balakrishnan
The performance of a Database Management System is closely coupled to the index structures it uses, making index selection extremely important. This project supplements an existing generic datastructure framework called just-in-time data structures with a simulation + cost-analysis-based approach to derive policies for data organization.
Multi-Dimensional Cracking Anna Jonet Joseph, Anand Sankar Bhagavandas
All the Spatial Indexing techniques being used today, including, but not limited to, R-Tree, R+ Tree, Quad Tree and Grid require prior knowledge about the data and query workloads. Cracking involves physically reorganizing the data by dividing the database into manageable pieces based on the incoming query workload. A cracker index, R+ Tree in our case, is created on the fly for cracked pieces and the later queries are answered using this. Prior works on Database Cracking was purely in one dimensional data, here we extend it to multi – dimensional database.
Cluster-Friendly Spatial Indexing Nikita Ganesh Konda
The increasing size and density of spatial data led to the use of NoSQL databases. However, efficient indexing of spatial data in NoSQL database is hard. Currently, the use of space filling curves for indexing is widely adopted. However, these partition the entire space uniformly, which may not be efficient for regions with sparsely populated data. The aim of this project is to eliminate empty spaces by building a two-tier index with R-tree as global index and space filling curve as local index.
Pattern Detection For Query Explanations Deepti Chavan, Shruti Parab, Sushmita Sinha
The high level aim of this project is to find the correlations in the dataset and to optimize the process of finding the same. Using these correlations as constraints, find possible reasons justifying or identifying the presence of an outlier. We intend to determine the type in which value attributes are associated with dimension attributes.
Differential Privacy (CSE 660)
The comparison of two algorithms on private causal Inference Liuyi Yao
Verification of Randomized Response in Coq Weihao Qu
Reconstruction of a Database using Fourier attack Sindhu Madhuri Morapakala
An Implementation of Differentially Private Bayesian Inference Jiawen Liu
An implementation of the DualQuery algorithm Shubham Shekhar Lagwankar, Muhammed Zaki Muhammed Husain Bakshi
Locally Differentially Private Protocols for Frequency Estimation Venkata Gayatri Pratyusha Gundugola, Manish Kasireddy
Differentially Private Sparse Inverse Covariance Estimation Mengdi Huai, Di Wang