Overlay Spreadsheets

Overlay Spreadsheets

Oliver Kennedy

University at Buffalo

Boris Glavic

Illinois Institute of Technology

Michael Brachmann

Breadcrumb Analytics

The Vizier Notebook

1. Vizier is a Workflow System

2. Vizier isn't Just Code

Including...

Spreadsheets in Workflows

(Wrangler: Interactive Visual Specification of Data Transformation Scripts; Kandel et al; CHI 2011)

Spreadsheets in Workflows

(trymito.io)

Spreadsheets" in Workflows

(The Exception that Improves the Rule; Freire et. al.; HILDA 2016)

Spreadsheets as Workflow Precursors

So why not just use a spreadsheet?

Goals

  • Build a spreadsheet...
  • ... that can replay edits on a new dataset.
  • ... while supporting "free-form" editing.

So we built a spreadsheet...

  1. Replaying Shape Updates
  2. Replaying Content Updates

Shape Changes

Mapping Functions

$row(n) = \begin{cases} n & \textbf{if } \color{#d40000}{\blacksquare}, \color{#6c5353}{\blacksquare}\\ n+1 & \textbf{if } \color{#008080}{\blacksquare}, \color{#a05a2c}{\blacksquare}\\ n-3 & \textbf{if } \color{#808000}{\blacksquare} \end{cases}$          $col(n) = \begin{cases} n & \textbf{if } \color{#d40000}{\blacksquare}, \color{#008080}{\blacksquare}\\ n-1 & \textbf{if } \color{#6c5353}{\blacksquare}, \color{#a05a2c}{\blacksquare}\\ \texttt{null} & \textbf{if } \color{#536c53}{\blacksquare}\\ \end{cases}$

Content Updates

Coping with Big Data

  1. Classical Spreadsheets (Excel, Google Sheets, etc...)
  2. DataSpread (Bendre et. al.; ICDE 2018)
  3. Ignore the Problem (This Talk)

Patterns

Coping with Big Data

Only materialize the subset of visible rows
(and/or columns)

Restricting the Source Data

Restricting the Overlay

Restricting the Overlay

We need to materialize the
transitive closure
of the cell dependencies.

Restricting the Overlay

Spreadsheets are optimized for reactive, interactive computation over cells.

... but here we have a big bulk computation.

Recursive Patterns

Recursive Patterns

$$H[0] = G[0]$$ $$H[n] = G[n] + H[n-1]$$
↓ $$H[n] = sum(G[0:n])$$

Many common patterns have an
equivalent closed-form representation.

Recursive Patterns

$$H[0:1] = G[0:1]$$ $$H[n] = G[n] + H[n-2]$$

  SELECT G + lag(H, 2) AS H OVER (ORDER BY row) FROM ...
		

Window queries work for the rest.

Recursive Patterns

Axis,60,600,6000,60000 Vizier,0.24214023499999998,0.416489697,4.6895117310000005,48.321868185 Vizier-Batch,0.292034878,0.291488883,0.31172266200000004,0.457365417 DataSpread,26.404142864,28.609918545,42.390664963,221.393066346
https://vizierdb.info
  • Decoupling edits from source data enables spreadsheets in workflow systems
  • Overlays support a (fast) hybrid batch/reactive execution model.

Open Challenges

  • Freeform edits don't usually map nicely to structured dataframes.
  • Content updates don't generalize through external changes in shape.
  • [Your challenge/comment here]