CSE 562

Feb 16 - Checkpoint 1

  • Requirements Overview
  • CREATE TABLE
  • Volcano Evaluation
  • EvalLib

Checkpoint 1

sif$ javac -cp build:jsqlparser.jar:evallib.jar -D build {all .java files} sif$ ls data/ R.csv      S.csv     T.csv sif$ head -n 2 data/R.csv 1|3|5 2|9|1 sif$ java -cp build:jsqlparser.jar:evallib.jar dubstep.Main
 - $> CREATE TABLE R(A int, B int, C int); $> SELECT A, C FROM R WHERE B < 5; 1|5 ... $>

Checkpoint 1

  • Your code compiled just the same as in Checkpoint 0.
  • Print a prompt '$>' at the start and after each command.
  • CREATE TABLE statements tell you the schema of each table.
  • Data lives in a '|'-separated file in 'data/[tablename].csv'
  • Print query results '|'-separated
→ SQL
CREATE TABLE PLAYERS(
  ID string, 
  FIRSTNAME string, 
  LASTNAME string, 
  FIRSTSEASON int, 
  LASTSEASON int, 
  WEIGHT int, 
  BIRTHDATE date
);

SELECT FIRSTNAME, LASTNAME, 
       WEIGHT, BIRTHDATE 
FROM PLAYERS 
WHERE WEIGHT>200;
→ SQL CREATE TABLE SELECT
CREATE TABLE PLAYERS(
  ID string, 
  FIRSTNAME string, 
  LASTNAME string, 
  FIRSTSEASON int, 
  LASTSEASON int, 
  WEIGHT int, 
  BIRTHDATE date
);

There is a table named "PLAYERS"...

  • ... with 7 attributes
  • ... who's attributes have the given types
  • ... with data in the file "data/PLAYERS.csv"

→ SQL CREATE TABLE Saved Schema .csv SELECT
ABDELAL01|Alaa|Abdelnaby|1990|1994|240|1968-06-24
ABDULKA01|Kareem|Abdul-jabbar|1969|1988|225|1947-04-16
ABDULMA01|Mahmo|Abdul-rauf|1990|2000|162|1969-03-09
ABDULTA01|Tariq|Abdul-wahad|1997|2002|223|1974-11-03
ABDURSH01|Shareef|Abdur-rahim|1996|2007|225|1976-12-11
ABERNTO01|Tom|Abernethy|1976|1980|220|1954-05-06
ABRAMJO01|John|Abramovic|1946|1947|195|1919-02-09
ACKERAL01|Alex|Acker|2005|2008|185|1983-01-21
ACKERDO01|Donald|Ackerman|1953|1953|183|1930-09-04
ACRESMA01|Mark|Acres|1987|1992|220|1962-11-15
ACTONCH01|Charles|Acton|1967|1967|210|1942-01-11
...		
→ SQL CREATE TABLE Saved Schema .csv SELECT Results

Example Queries

  1. SELECT A, B, ... FROM R (Project)
  2. SELECT A, B, ... FROM R WHERE ... (Project+Filter)
  3. SELECT A+B AS C, ... FROM R (Map)
  4. SELECT A+B AS C, ... FROM R WHERE ... (Map+Filter)
  5. SELECT SUM(A+B) AS C, ... FROM R (Aggregate)
  6. SELECT SUM(A+B) AS C, ... FROM R WHERE ... (Aggregate+Filter)
→ SQL CREATE TABLE Saved Schema .csv SELECT Results
    if(stmt instanceof Select){
      SelectBody bodyBase = ((Select)stmt).getSelectBody();
      if(bodyBase instanceof PlainSelect){
        PlainSelect body = (PlainSelect)bodyBase;
        ...
        body.getFromItem()
        body.getWhere()
        body.getSelectItems()
        ...
      }
    }
→ SQL CREATE TABLE Saved Schema .csv SELECT Iterators Results

Example Query Patterns

  1. SELECT A, B, ... FROM R (Project)
  2. SELECT A, B, ... FROM R WHERE ... (Project+Filter)
  3. SELECT A+B AS C, ... FROM R (Map)
  4. SELECT A+B AS C, ... FROM R WHERE ... (Map+Filter)
  5. SELECT SUM(A+B) AS C, ... FROM R (Aggregate)
  6. SELECT SUM(A+B) AS C, ... FROM R WHERE ... (Aggregate+Filter)

Iterators

There are a number of data transformations that appear in more than one pattern. For example:

  1. Loading the CSV file in as data
  2. Filtering rows out of data
  3. Transforming (mapping) data into a new structure
  4. Summarizing (aggregating) data
  5. Printing output data

Suggestion: Abstract these steps out

Idea: Functions

  1. loadCSV(...) returns a table
  2. filter(condition, table) returns a table
  3. map(expressions, table) returns a table
  4. aggregate(aggregates, table) returns a row
  5. print(table)

Problem: A "table" can get very very big.

Better Idea: Iterators

hasNext()
Returns true if there are more rows to return
next()
Returns the next row

All "functions" can be implemented as iterators that use constant space

Example: Filter

'|'-separated Value File Suggestions

  • Use BufferedReader's readline() method
  • Precompile your '|' splitter using java.util.regexp.Pattern.compile()
  • Parse everything upfront
  • Keep each row as an Array of PrimitiveValues
  • For Codd's sake, don't store entire tables in memory
SQL TypePrimitiveValue
stringStringValue
varcharStringValue
charStringValue
intLongValue
decimalDoubleValue
dateDateValue

Aggregate Functions

  • SUM(A)
  • COUNT(*)
  • MIN(A)
  • MAX(A)

EvalLib

How do you evaluate the A > 5 in
SELECT B FROM R WHERE A > 5?

More generally, how do you evaluate a JSqlParser Expression?


Eval eval = new Eval(){ /* we'll get what goes here shortly */ }
		

// Evaluate "1 + 2.0"
PrimitiveValue result;
result = 
  eval.eval(
    new Addition(
      new LongPrimitive(1),
      new DoublePrimitive(2.0)
    )
  ); 
System.out.println("Result: "+result); // "Result: 3.0"
		

// Evaluate "R.A >= 5"
result =
  eval.eval(
    new GreaterThanEquals(
      new Column(new Table(null, "R"), "A"),
      new LongPrimitive(5)
    )
  );
		

Problem: What value should EvalLib give to R.A?

Your code needs to tell it...


Eval eval = new Eval(){
  public PrimitiveValue eval(Column c){ 
    String colName = c.getColumnName();
    String tableName = c.getTable().getName();
    PrimitiveValue ret = /* look up colName.tableName */
    return ret;
  }
}
		

Questions?