October 28
European travel booking service.
Excellent performance < Consistent performance
Let's design...
(focus on single-thread for the moment)
Process one query at a time
and so forth...
For each batch of queries, scan over all records and give each query a chance to access/manipulate the data
Can we do better?
Queries blocked until last batch finishes.
Idea: Allow queries to "attach" to the scan at any point
(Detatch each query after it's processed one full cycle of the data)
Challenge: Checking for new queries is expensive
Partition data into chunks, only check at chunk boundaries
Tradeoff: Bigger chunks = better throughput + worse latency
Can we do better?
Problem: Can't index the data without giving up scan sharing.
Idea: Index the query
Classic Indexing: Which records might answer this query.
Query Indexing: Which queries might this record be an answer for.
Q1 = SELECT ... WHERE A = 12
Q2 = SELECT ... WHERE A = 17
Q3 = SELECT ... WHERE A = 17
Range queries become range relations.
Q1 = SELECT ... WHERE A >= 12 AND A < 15
Q2 = SELECT ... WHERE A >= 15 AND A < 17
Q3 = SELECT ... WHERE A >= 15 AND A < 22
For each "slot", apply operations in-order.
↓
For each "slot", find matching operations and appy in order.
Is there a problem?
Problem: Updates and Deletes can invalidate the index lookup.
Problem: Inserts only applicable to empty slots.
Consideration: Critical for consistency that operations be applied in-order.
for slot in data:
t = 0
while True:
next_op = index_lookup(slot, op.t > t)
if next_op:
next_op(slot)
t = next_op.t
else:
break
Implicit Index: Inserts only apply to empty slots
for slot in data:
t = 0
while True:
if slot.empty:
next_op = next_insert_after(t)
else:
next_op = index_lookup(slot, op.t > t)
if next_op:
next_op(slot)
t = next_op.t
else:
break
Optimization: Separate batches for mutations and queries. For each record, process mutations first, then queries.
Challenge: Need to create indexes fast.
Idea: Greedily pick the best attributes.
Compute most valuable attribute $$\texttt{argmin}_A\left( \sum_q (1 - \texttt{selectivity}(q, A)) \right)$$
Index and drop all queries conditioned on the attribute (1 index per query)
Repeat until most valuable attribute < threshold
Idea: Give each thread a separate data partition (shared nothing).
Strict atomicity not needed for Amadeus workload!
(All operations can be isolated to a single thread partition)
... but could implement with an undo log.
No constraint enforcement required.
In-order execution sufficient (batching arrival order is "good enough").
... could also build snapshot isolation.
Snapshot query periodically copies records out to a buffer.
Buffer asynchronously checkpointed to disk.
Replay log recovers operations since last checkpoint.