February 8, 2019
CREATE TABLE PLAYERS(
ID string,
FIRSTNAME string,
LASTNAME string,
FIRSTSEASON int,
LASTSEASON int,
WEIGHT int,
BIRTHDATE date
);
SELECT FIRSTNAME, LASTNAME,
WEIGHT, BIRTHDATE
FROM PLAYERS
WHERE WEIGHT>200;
CREATE TABLE PLAYERS(
ID string,
FIRSTNAME string,
LASTNAME string,
FIRSTSEASON int,
LASTSEASON int,
WEIGHT int,
BIRTHDATE date
);
↓
There is a table named "PLAYERS"...
ABDELAL01|Alaa|Abdelnaby|1990|1994|240|1968-06-24 ABDULKA01|Kareem|Abdul-jabbar|1969|1988|225|1947-04-16 ABDULMA01|Mahmo|Abdul-rauf|1990|2000|162|1969-03-09 ABDULTA01|Tariq|Abdul-wahad|1997|2002|223|1974-11-03 ABDURSH01|Shareef|Abdur-rahim|1996|2007|225|1976-12-11 ABERNTO01|Tom|Abernethy|1976|1980|220|1954-05-06 ABRAMJO01|John|Abramovic|1946|1947|195|1919-02-09 ACKERAL01|Alex|Acker|2005|2008|185|1983-01-21 ACKERDO01|Donald|Ackerman|1953|1953|183|1930-09-04 ACRESMA01|Mark|Acres|1987|1992|220|1962-11-15 ACTONCH01|Charles|Acton|1967|1967|210|1942-01-11 ...
if(stmt instanceof Select){
SelectBody bodyBase = ((Select)stmt).getSelectBody();
if(bodyBase instanceof PlainSelect){
PlainSelect body = (PlainSelect)bodyBase;
...
body.getFromItem()
body.getWhere()
body.getSelectItems()
...
}
}
There are a number of data transformations that appear in more than one pattern. For example:
Suggestion: Abstract these steps out
r = get_table("R")
s = get_table("S")
temp1 = apply_join(r, s, "R.B = S.B")
temp2 = apply_select(temp1, "S.C = 10")
result = apply_projection(temp2, "R.A")
$$\sigma_{A \neq 3} R$$
A | B |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
def apply_select(input, condition)
result = []
for row in input:
if condition(row):
result += [row]
return result;
(All-At-Once)
Problem: A "table" can get very very big.
All "functions" can be implemented as iterators that use constant space
$$\sigma_{A \neq 3} R$$
A | B | |
---|---|---|
getNext() | for row in input: | |
1 | 2 | return row; |
getNext() | for row in input: | |
3 | 4 | X |
5 | 6 | return row; |
getNext() | for row in input: | |
None | return None; |
for r in R:
for s in S:
emit(merge(r, s))
class NestedLoopJoinIterator implements RAIterator
{
RAIterator R, S;
Row current_r = null;
public NestedLoopJoinIterator(RAIterator R, RAIterator S) {
this.child = child; this.condition = condition;
}
public Row next(){
if(!S.hasNext()) { S.reset(); current_r = null; }
if(current_r == null){ current_r = R.next(); }
return merge(S.next(), current_r);
}
/* ...? */
}
java.util.regexp.Pattern.compile()
Also, Use EvalLib's PrimitiveValue
SQL Type | PrimitiveValue |
---|---|
string | StringValue |
varchar | StringValue |
char | StringValue |
int | LongValue |
decimal | DoubleValue |
date | DateValue |
How do you evaluate the A > 5
in
SELECT B FROM R WHERE A > 5
?
More generally, how do you evaluate a JSqlParser Expression
?
Eval eval = new Eval(){ /* we'll get what goes here shortly */ }
// Evaluate "1 + 2.0"
PrimitiveValue result;
result =
eval.eval(
new Addition(
new LongPrimitive(1),
new DoublePrimitive(2.0)
)
);
System.out.println("Result: "+result); // "Result: 3.0"
// Evaluate "R.A >= 5"
result =
eval.eval(
new GreaterThanEquals(
new Column(new Table(null, "R"), "A"), // `R.A`
new LongPrimitive(5)
)
);
Problem: What value should EvalLib give to R.A
?
Your code needs to tell it...
Eval eval = new Eval(){
public PrimitiveValue eval(Column c){
String colName = c.getColumnName();
String tableName = c.getTable().getName();
PrimitiveValue ret = /* look up colName.tableName */
return ret;
}
}
|
RAIterator result = new TableIterator(plainSelect.getFromItem());
if(plainSelect.getJoins() != null){
for(Join join : plainSelect.getJoins()){
/** Make sure Join is a simple join **/
result = new NestedLoopJoinIterator(result, join.getFromItem());
}
}
if(plainSelect.getWhere() != null){
result = new FilterIterator(plainSelect.getWhere(), result);
}