Once again, we will be tightening performance constraints. You will be expected to complete queries in seconds, rather than tens of seconds as before. This time however, you will be given a few minutes alone with the data before we start timing you.
Concretely, you will be given a period of up to 5 minutes that we'll call the Load Phase. During the load phase, you will have access to the data, as well as a database directory that will not be erased in between runs of your application. Example uses for this time include building indexes or gathering statistics about the data for use in cost-based estimation.
Additionally, CREATE TABLE statements will be annotated with PRIMARY KEY and INDEX attributes. You may also hardcode index selections for the TPC-H benchmark based on your own experimentation.
Your code will be evaluated in nearly the same way as Projects 1 and 2. Your code will be presented with a 1000MB (SF 1) TPC-H dataset. You will get a cumulative 5 minutes to process all of the CREATE TABLE statements; This time will not count towards your overall time. Taking more than 5 minutes will result in a 0 grade for the submission.
The first phase will occur immediately after processing CREATE TABLE statements. You will receive a series of random queries drawn from the TPC-H benchmark, and should produce responses as quickly as possible.
After processing queries for the first phase, your code will be terminated. Your code will be restarted in the same directory. In this phase, you will receive a similar series of random queries drawn from the TPC-H benchmark. However, unlike phase 1, there will no no CREATE TABLE statements at all. You are expected to preserve any and all state associated with the created tables on your own.
Your code will be subjected to a sequence of test cases and evaluated on speed and correctness.