CSE 662 Fall 2019

Scan Sharing (Crescando)

October 28

Amadeus

European travel booking service.

OLTP workload (Book, Check, etc...)
High throughput
Simple queries (few joins, some multi-table updates)

Excellent performance < Consistent performance

Let's design...

(focus on single-thread for the moment)

Naive Implementation

Process one query at a time

and so forth...

Can we do better?

Index the data: Pro: Lower query latency (= higher throughput).
Con: Updates more expensive; Poor IO bandwidth use.
Share work between queries: Pro: Better IO use; Higher throughput.
Con: Higher latency per query; Potentially more work per query.

Shared Scan (Elevator)

For each batch of queries, scan over all records and give each query a chance to access/manipulate the data

Pros: Each record is only loaded once per batch.; Higher throughput
Cons: Each query is loaded exactly once per batch; Queries need to wait for the entire batch to finish; Queries blocked until last batch finishes; Every query processes every record

Can we do better?

Queries blocked until last batch finishes.

Idea: Allow queries to "attach" to the scan at any point

(Detatch each query after it's processed one full cycle of the data)

Challenge: Checking for new queries is expensive

Partition data into chunks, only check at chunk boundaries

Tradeoff: Bigger chunks = better throughput + worse latency

Pros: Each record is only loaded once per batch.; Higher throughput
Cons: Each query is loaded exactly once per batch; Queries need to wait for the entire batch to finish; Queries blocked until last batch finishes; Every query processes every record

Can we do better?

Problem: Can't index the data without giving up scan sharing.

Idea: Index the query

Classic Indexing: Which records might answer this query.

Query Indexing: Which queries might this record be an answer for.


                   Q1 = SELECT ... WHERE A = 12
                   Q2 = SELECT ... WHERE A = 17
                   Q3 = SELECT ... WHERE A = 17

$R_1 = \left< A: 10, \ldots \right>$ : ignore
$R_2 = \left< A: 12, \ldots\right>$ : $Q_1.fold(R_1)$
$R_3 = \left< A: 14, \ldots\right>$ : ignore
$R_3 = \left< A: 15, \ldots\right>$ : ignore
$R_5 = \left< A: 17, \ldots\right>$ : $Q_2.fold(R_5)$; $Q_3.fold(R_5)$

Range queries become range relations.


             Q1 = SELECT ... WHERE A >= 12 AND A < 15
             Q2 = SELECT ... WHERE A >= 15 AND A < 17
             Q3 = SELECT ... WHERE A >= 15 AND A < 22

Types of Operations

QUERY: Read in, do not mutate
INSERT: No read, just write to an empty spot
DELETE: Remove all matching records
UPDATE: Transform the values of all matching records

Naive Algorithm

For each "slot", apply operations in-order.

↓

Indexed Algorithm

For each "slot", find matching operations and appy in order.

Is there a problem?

Indexing Queries with Mutations

Problem: Updates and Deletes can invalidate the index lookup.

Problem: Inserts only applicable to empty slots.

Consideration: Critical for consistency that operations be applied in-order.


      for slot in data: 
        t = 0
        while True:
          next_op = index_lookup(slot, op.t > t)
          if next_op: 
            next_op(slot)
            t = next_op.t
          else:
            break

Implicit Index: Inserts only apply to empty slots


      for slot in data: 
        t = 0
        while True:
          if slot.empty: 
            next_op = next_insert_after(t)
          else:
            next_op = index_lookup(slot, op.t > t)
          if next_op: 
            next_op(slot)
            t = next_op.t
          else:
            break

Multiple Indices

Zero or more predicate indices
A table of "unindexed" operations
A list of Inserts (for empty slots)

Predictable Performance for Unpredictable Workloads (Unterbrunner et. al.)

Optimization: Separate batches for mutations and queries. For each record, process mutations first, then queries.

On the Amadeus workload...

Cycling through a batch of records is fast (1s).
Index contents change on every attach/detach.
Attributes for indexing may not be known upfront.

Challenge: Need to create indexes fast.

Idea: Greedily pick the best attributes.

Compute most valuable attribute $$\texttt{argmin}_A\left( \sum_q (1 - \texttt{selectivity}(q, A)) \right)$$

Index and drop all queries conditioned on the attribute (1 index per query)

Repeat until most valuable attribute < threshold

Multithreading Challenges

Cache locality is separate for each CPU core.
Lock contention kills throughput.

On the Amadeus workload...

Few cross-table reads
Mostly determinstic writes

Idea: Give each thread a separate data partition (shared nothing).

Amadeus/Crescando on ACID

Atomicity

Strict atomicity not needed for Amadeus workload!
(All operations can be isolated to a single thread partition)

... but could implement with an undo log.

Consistency

No constraint enforcement required.

Isolation

In-order execution sufficient (batching arrival order is "good enough").

... could also build snapshot isolation.

Durability

Snapshot query periodically copies records out to a buffer.

Buffer asynchronously checkpointed to disk.

Replay log recovers operations since last checkpoint.