Fork me on GitHub
mimir_logo_final

Don't Wrangle, Guess

One of the biggest costs in analytics is data wrangling: Getting your messy, mis-labeled, disorganized data together so you can actually ask your questions. All data wrangling tools force you to do all this work upfront, before you actually know what you even want to do with the data. Mimir lets you at your data sooner by tracking your cleaning todos. Ask first, clean later, with Mimir.

Get Mimir

Mimir is about getting you to your analysis as fast as possible. It lets you harness the raw power of SQL, StackOverflow's second-most popular language for 4 years running. Mimir then adds a ton of powerful SQL extensions designed to dealing with messy data easier:

LOAD

LOAD

Stop messing with data import and relational schema design. The versatile LOAD command allows you to quickly transform documents into relational tables without the muss and fuss of upfront schema design or defining complex transformation operators.

PLOT

PLOT

Stop writing messy scripts to visualize your data. The (soon™ to be released) PLOT command lets you take SQL queries and see them directly – notebook style, PDF/PNG, or Javascript, take your pick. Mimir even keeps track of unknowns in your data.

ANALYZE

ANALYZE

Mimir keeps track of your wrangling to-dos, marking query results that might have errors. When you need to be more precise, the ANALYZE command zeroes in on the specific wrangling you need right now.

Unlike most other SQL-based systems, Mimir lets you make decisions during and after data exploration. All of Mimir's functionality is based on three ideas: (1) Mimir provides sensible best guess defaults, and (2) Mimir warns you when one of its guesses is going to affect what it's telling you, and (3) Mimir lets you easily inspect what it's doing to your data with ANALYZE.

Better still, you don't need any new infrastructure. Mimir attaches to ordinary relational databases through JDBC (We currently support SQLite, with SparkSQL and Oracle support in progress). If you don't care, Mimir just puts everything in a super portable SQLite database by default.


Documentation


Who Are We?

The Team
Mike Brachman, Poonam Kumari, William Spoth, Jon Logan, Aaron Huber, Lisa Lu, Shivang Aggarwal, Olivia Alphonce
Research Advisors
Oliver Kennedy, Boris Glavic
Industry Advisors
Ronny Fehling (Airbus), Dieter Gawlick (Oracle), Zhen Hua Liu (Oracle), Beda Hammerschmidt (Oracle)
Alumni
Vinayak Karuppasamy, Arindam Nandi, Niccolò Meneghetti, Ying Yang

Mimir is supported by gifts from Oracle, as well as grants from the NSF and Naval Postgraduate School


Presentations

Video Demo (2015)
Overview Slides (2015)
Rant: What if Databases Could Answer Incorrectly (2015)

Publications

  • Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning

    Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer

  • Convergent Inference with Leaky Joins

  • Adaptive Schema Databases

    William Spoth, Bahareh Sadat Arab, Eric S. Chan, Dieter Gawlick, Adel Ghoneimy, Boris Glavic, Beda Hammerschmidt, Oliver Kennedy, Seokki Lee, Zhen Hua Liu, Xing Niu, Ying Yang

  • Communicating Data Quality in On-Demand Curation

  • The Exception That Improves The Rule

    Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller

  • Provenance-aware Versioned Dataworkspaces

    Xing Niu, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Oliver Kennedy, Boris Glavic

  • Lenses: An On-Demand Approach to ETL

    Ying Yang, Niccolò Meneghetti, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick, Oliver Kennedy

  • Detecting the Temporal Context of Queries

    Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, Dieter Gawlick

  • On-Demand Query Result Cleaning