CSE 662 Fall 2019 - Scan Sharing (Crescando)

Scan Sharing (Crescando)

CSE 662 Fall 2019

October 28

Amadeus

European travel booking service.

  • OLTP workload (Book, Check, etc...)
  • High throughput
  • Simple queries (few joins, some multi-table updates)

Excellent performance < Consistent performance

Let's design...

(focus on single-thread for the moment)

Naive Implementation

Process one query at a time

and so forth...

Can we do better?

Index the data
Pro: Lower query latency (= higher throughput).
Con: Updates more expensive; Poor IO bandwidth use.
Share work between queries
Pro: Better IO use; Higher throughput.
Con: Higher latency per query; Potentially more work per query.

Shared Scan (Elevator)

For each batch of queries, scan over all records and give each query a chance to access/manipulate the data

Pros
Each record is only loaded once per batch.
Higher throughput
Cons
Each query is loaded exactly once per batch
Queries need to wait for the entire batch to finish
Queries blocked until last batch finishes
Every query processes every record

Can we do better?

Queries blocked until last batch finishes.

Idea: Allow queries to "attach" to the scan at any point

(Detatch each query after it's processed one full cycle of the data)

Challenge: Checking for new queries is expensive

Partition data into chunks, only check at chunk boundaries

Tradeoff: Bigger chunks = better throughput + worse latency

Pros
Each record is only loaded once per batch.
Higher throughput
Cons
Each query is loaded exactly once per batch
Queries need to wait for the entire batch to finish
Queries blocked until last batch finishes
Every query processes every record

Can we do better?

Problem: Can't index the data without giving up scan sharing.

Idea: Index the query

Classic Indexing: Which records might answer this query.

Query Indexing: Which queries might this record be an answer for.


                   Q1 = SELECT ... WHERE A = 12
                   Q2 = SELECT ... WHERE A = 17
                   Q3 = SELECT ... WHERE A = 17 
    
  • $R_1 = \left< A: 10, \ldots \right>$ : ignore
  • $R_2 = \left< A: 12, \ldots\right>$ : $Q_1.fold(R_1)$
  • $R_3 = \left< A: 14, \ldots\right>$ : ignore
  • $R_3 = \left< A: 15, \ldots\right>$ : ignore
  • $R_5 = \left< A: 17, \ldots\right>$ : $Q_2.fold(R_5)$; $Q_3.fold(R_5)$

Range queries become range relations.


             Q1 = SELECT ... WHERE A >= 12 AND A < 15
             Q2 = SELECT ... WHERE A >= 15 AND A < 17
             Q3 = SELECT ... WHERE A >= 15 AND A < 22
    

Types of Operations

  • QUERY: Read in, do not mutate
  • INSERT: No read, just write to an empty spot
  • DELETE: Remove all matching records
  • UPDATE: Transform the values of all matching records

Naive Algorithm

For each "slot", apply operations in-order.

Indexed Algorithm

For each "slot", find matching operations and appy in order.

Is there a problem?

Indexing Queries with Mutations

Problem: Updates and Deletes can invalidate the index lookup.

Problem: Inserts only applicable to empty slots.

Consideration: Critical for consistency that operations be applied in-order.


      for slot in data: 
        t = 0
        while True:
          next_op = index_lookup(slot, op.t > t)
          if next_op: 
            next_op(slot)
            t = next_op.t
          else:
            break
    

Implicit Index: Inserts only apply to empty slots


      for slot in data: 
        t = 0
        while True:
          if slot.empty: 
            next_op = next_insert_after(t)
          else:
            next_op = index_lookup(slot, op.t > t)
          if next_op: 
            next_op(slot)
            t = next_op.t
          else:
            break
    

Multiple Indices

  • Zero or more predicate indices
  • A table of "unindexed" operations
  • A list of Inserts (for empty slots)

Predictable Performance for Unpredictable Workloads (Unterbrunner et. al.)

Optimization: Separate batches for mutations and queries. For each record, process mutations first, then queries.

On the Amadeus workload...

  • Cycling through a batch of records is fast (1s).
  • Index contents change on every attach/detach.
  • Attributes for indexing may not be known upfront.

Challenge: Need to create indexes fast.

Idea: Greedily pick the best attributes.

Compute most valuable attribute $$\texttt{argmin}_A\left( \sum_q (1 - \texttt{selectivity}(q, A)) \right)$$

Index and drop all queries conditioned on the attribute (1 index per query)

Repeat until most valuable attribute < threshold

Multithreading Challenges

  • Cache locality is separate for each CPU core.
  • Lock contention kills throughput.

On the Amadeus workload...

  • Few cross-table reads
  • Mostly determinstic writes

Idea: Give each thread a separate data partition (shared nothing).

Amadeus/Crescando on ACID

Atomicity

Strict atomicity not needed for Amadeus workload!
(All operations can be isolated to a single thread partition)

... but could implement with an undo log.

Consistency

No constraint enforcement required.

Isolation

In-order execution sufficient (batching arrival order is "good enough").

... could also build snapshot isolation.

Durability

Snapshot query periodically copies records out to a buffer.

Buffer asynchronously checkpointed to disk.

Replay log recovers operations since last checkpoint.