Languages and Runtimes

(for Big Data)

Oliver Kennedy

okennedy@buffalo.edu

Capen 212

Logistics

Email

Always add [CSE662] to the title of emails

  • This helps me to reply to your email faster
  • The tag is mandatory for assignments

Academic Integrity

Each group will have a separate project. I don't expect cheating to be an issue, but to be clear...

  • I encourage you to talk to your classmates about ideas and papers out of class.
  • I expect you to work with your team on the project.
  • You should use outside tools/code/libraries (with attribution) if they're useful.
  • If you/your team submits something as your work... it had better be your work.

DB ≈ PL

Databases Programming Languages
Indexes Data Structures
Transactions/Logging Software Transactional Memory
Incremental Views Self-Adapting Computation
Query Rewriting / Performance Models Compiler Optimization / Program Analysis
Probabilistic DBs Probabilistic Programming

DB ≈ PL

Data-Centric Programs
Turing-Complete Programs

Course Structure

  • Data Structures, Indexes, Adaptive Indexing
  • Uncertainty in Data
  • Transactions, Concurrency, and Synchronizing Actors
  • High-Throughput Data Processing

Course Structure

Monday Wednesday Friday
Classical Lecture
(Paper of the Week)
Group Presentations and Meetings

Paper Discussion

  • One paper every week (Assigned by Weds Night).
  • I will be calling on random people to answer questions about the paper.
  • Every group will be asked to present one paper pertinent to your project.
  • Class participation is 20% of your grade.

Group Presentations

  • Present background, work-in-progress, your design choices, algorithms, information, code, performance metrics and/or analysis.
  • Defend your ideas and design choices in a public setting.
  • Everyone must attend.
  • I will be calling on random people to ask questions of the presenters.
  • Class participation is 20% of your grade.

Class Participation

I use a 3 point system:

  • 0 points: You're not here when I call on you
  • 2 points: You have a meaningful comment/question about the project/paper
  • 1 points: Everything else

You get 2 excused absences (guarantees I won't call on you) for the term.

Project Seeds

  • Decentralized IoT Plumbing
  • Uncertainty-Aware Machine Learning
  • Web-of-Trust for Crowdsourced Data
  • Sensitivity Analysis in Mimir
  • Sandboxed Python

Paper Assignment 1

The Case for Learned Index Structures
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis

Be ready to intelligently discuss the paper's contents Monday Sept. 3

Class Introductions

  • What is your name?
  • What did you do over the summer?
  • Why did you take this class?
  • What is your favorite Sci-Fi TV/Movie/Book?