Fork me on GitHub

Don't Wrangle, Guess

One of the biggest costs in analytics is data wrangling: Getting your messy, mis-labeled, disorganized data together so you can actually ask your questions. All data wrangling tools force you to do all this work upfront, before you actually know what you even want to do with the data. Mimir lets you at your data sooner by tracking your cleaning todos. Ask first, clean later, with Mimir.

Get Mimir

Mimir is about getting you to your analysis as fast as possible. It lets you harness the raw power of SQL, StackOverflow's second-most popular language for 4 years running. Mimir then adds a ton of powerful SQL extensions designed to dealing with messy data easier:



Stop messing with data import and relational schema design. The versatile LOAD command allows you to quickly transform documents into relational tables without the muss and fuss of upfront schema design or defining complex transformation operators.



Stop writing messy scripts to visualize your data. The PLOT command lets you take SQL queries and see them directly – notebook style, PDF/PNG, or Javascript, take your pick. Mimir even keeps track of unknowns in your data.



Mimir keeps track of your wrangling to-dos, marking query results that might have errors. When you need to be more precise, the ANALYZE command zeroes in on the specific wrangling you need right now.

Unlike most other SQL-based systems, Mimir lets you make decisions during and after data exploration. All of Mimir's functionality is based on three ideas: (1) Mimir provides sensible best guess defaults, and (2) Mimir warns you when one of its guesses is going to affect what it's telling you, and (3) Mimir lets you easily inspect what it's doing to your data with ANALYZE.

Better still, you don't need any new infrastructure. Mimir attaches to ordinary relational databases through JDBC (We currently support SQLite, with SparkSQL and Oracle support in progress). If you don't care, Mimir just puts everything in a super portable SQLite database by default.


Who Are We?

The Team
Mike Brachmann, Poonam Kumari, William Spoth, Aaron Huber, Lisa Lu, Shivang Aggarwal, Olivia Alphonce
Research Advisors
Oliver Kennedy, Boris Glavic
Industry Advisors
Ronny Fehling (Airbus), Dieter Gawlick (Oracle), Zhen Hua Liu (Oracle), Beda Hammerschmidt (Oracle)
Vinayak Karuppasamy, Arindam Nandi, Niccolò Meneghetti, Ying Yang

Mimir is supported by gifts from Oracle, as well as grants from the NSF and Naval Postgraduate School


Video Demo (2015)
Overview Slides (2015)
Rant: What if Databases Could Answer Incorrectly (2015)


Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning
Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer
Invited to submit an extended version as a 'Best-of-SIGMOD' paper for ACM-TODS
Convergent Inference with Leaky Joins
Adaptive Schema Databases
William Spoth, Bahareh Sadat Arab, Eric S. Chan, Dieter Gawlick, Adel Ghoneimy, Boris Glavic, Beda Hammerschmidt, Oliver Kennedy, Seokki Lee, Zhen Hua Liu, Xing Niu, Ying Yang
Communicating Data Quality in On-Demand Curation
The Exception That Improves The Rule
Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller
Provenance-aware Versioned Dataworkspaces
Xing Niu, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Oliver Kennedy, Boris Glavic
Lenses: An On-Demand Approach to ETL
Ying Yang, Niccolò Meneghetti, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick, Oliver Kennedy
Detecting the Temporal Context of Queries
Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick
On-Demand Query Result Cleaning