CSE-462 Spring 2016

Data Management Systems (including Relational Databases, Non-Relational Databases, and NoSQL storage systems) are the basis for any big data project.  A data management system is responsible for storing data, enabling efficient access to that data, as well as mediating concurrent modifications.  This class teaches data management systems both in terms of the basic principles of their design, and the practical challenges of implementing them. The course is built around a term-long programming assignment, in which you will build a system for answering SQL queries efficiently.  Course lectures will focus on the conceptual basis for this system and how the techniques that you implement in the project generalize (e.g., to the use of NoSQL systems)

In this course, you will learn...

• ... how to efficiently store and retrieve data programatically.
• ... how to optimize big-data computations.
• ... how to use index structures to accelerate computations.
• ... how to safely and efficiently manipulate data concurrently.
• ... how to recover state after software and hardware failures.
• ... how to query and update distributed data consistently.

Course Details

• Class: T/Th, 12:30-1:50 PM in NSC 220
• Class Forum: Piazza
• Textbook: "Database Systems, The Complete Book" 2nd Edition
by Garcia-Molina, Ullman, and Widom.
• Optional Readings:
• Instructor: Oliver Kennedy (Davis 338H, Office Hours Weds 10 AM-11 AM or by appointment; okennedy at buffalo)
• TA: Jun Chu (TA Lounge; Office Hours Tu 3:30-5:30; jchu6 at buffalo)
• Recitations: Tuesday 9 AM, Wednesdays 12 Noon
• Project Submission: http://dubstep.odin.cse.buffalo.edu
• Project Groups: 1-4 people
• Grading:
• 10% Homework (Lowest 2 grades dropped)
• Due ~1/week on Thursdays
• 40% Exams
• 10% Midterm 1 on March 1 (in class)
• 10% Midterm 2 on April 12 (in class)
• 20% Comprehensive Final on Thu May 14 (4:00-6:30)
• OR 5%/10%/25% OR 5%/5%/30% (whichever is most advantageous to you)
• 50% Projects

Lecture Schedule

• Jan. 26:   Introduction and Outline ( slides )
• Jan. 28:   Relational Algebra ( slides | video )
• Feb. 2:   SQL ( slides | video )
• Feb. 4:   SQL to RA and Evaluation ( slides | video )
• Feb. 9:   Project 1 Overview, Generalized RA ( slides | video )
• Feb. 11:   Data Layout (Serialization, Paging, Columnar) ( slides | video )
• Feb. 16:   Data Organization (Sorting, Tree Indexes) ( slides | video )
• Feb. 18:   Hash Indexes, Joins, and Optimization ( slides )
• Feb. 23:   Joins and Optimization ( slides )
• Feb. 25:   Midterm 1 Review ( slides | video )
• Mar. 1:   Midterm 1
• Mar. 3:   External Algorithms ( slides | video )
• Mar. 8:   Project 2 Review ( code | video )
• Mar. 10:   Data Modeling (E/R and Constraints) ( slides | video )
• Mar. 15:   Spring Break!
• Mar. 17:   Spring Break!
• Mar. 22:   Cost-Based-Optimization ( slides | video )
• Mar. 24:   Transaction Correctness ( slides | videos )
• Mar. 29:   Transactions-Locking ( slides | video )
• Mar. 31:   Transactions-OCC, Versioning ( slides | video )
• Apr. 5:   ARIES (Write-Ahead Logging, Undo-Logging, Recovery) ( slides | video )
• Apr. 7:   Midterm 2 Review ( slides | video )
• Apr. 12:   Midterm 2
• Apr. 14:   Project 3 Review ( video )
• Apr. 19:   Views ( slides | video )
• Apr. 21:   Stream Queries ( slides )
• Apr. 26:   Parallel Data ( slides )
• Apr. 28:   Parallel Joins ( slides )
• May 3:   Parallel Updates ( slides )
• May 5:   Final Review

• Content Outline

• Project 0 - Basic Setup
• Project 1 - Infrastructure & Query Evaluation
• Relational Algebra (Ch 2.4, 5.1)
• SQL (Ch 2.3, 6.1-6.4 and 16.1)
• Query Compilers (Ch 15.1-15.3, 16.1, 16.3)
• Data Modeling (Ch 2.1-2.2)
• Project 2 - Optimization & External Algorithms
• Algebraic Query Optimization (Ch 16.2)
• Join Algorithms (Ch 15.4, 15.5)
• Extended Relational Algebra (Ch 5.2)
• Buffering & External Algorithms (Ch 15.7-15.8)
• Physical Plans (Ch 16.7)
• Project 3 - Indexing & Physical Layout
• The Memory Hierarchy (Ch 13.1-13.3)
• Physical Design (Ch 13.5-13.7)
• Indexing (Ch 8.3, 14.1-14.4)
• Materialized Views (Ch 8.1-8.2, 8.5)
• Cost-Based Optimization (Ch 8.4, 16.4-16.6)
• Concepts (No Project)
• Failure Recovery (Ch 13.4, 19.1, 19.3)
• Updating Data (Ch 6.5, 13.8)
• Transactions (Ch 6.6, 18.1-18.2)
• Locking (Ch 18.3-18.7)
• Deadlocks (Ch 19.2)
• Lock-free Concurrency (Ch 18.8-18.9)
• Distributed Data Management (Ch 20)
• Uncertain Data Management
• Time permitting, other subjects will also be covered.

Assignment Submission

Homeworks will be collected in recitation on the day they are due, or by email to the instructor and TA(s) before class begins if you are unable to attend class. Late homeworks will not be accepted and will receive a grade of 0. Note that your lowest 2 homework grades will be dropped. Projects are submitted through the online submission system using GIT. You may submit your assignments to be graded many times as you like. Late submissions for Checkpoint 0 will not be accepted. Submissions for checkpoints 1-3 will receive a 3/15 point penalty per day late. Late penalties are per-submission. Your group's grade is the highest of all grades received by your group for the project, so your grade can never decrease from additional submissions (even if they're late). Don't be afraid to experiment with new ideas once you've gotten a grade you're happy with!

Letter Grades

You will get a(n)... A A- B+ B B- C+ C C- D F
If your number grade is at least... 93 90 87 83 80 77 73 70 60 $-\infty$
But less than... $\infty$ 93 90 87 83 80 77 73 70 59

Attendance

I see attendance as a privilege rather than a requirement: You (or your parents, or your scholarship, etc...) are paying me so that you can interrupt me with a question if when I say something stupid or confusing in class. There is no mandatory attendance. However, if you ask me a question by email or during office hours that I already addressed to the class's satisfaction, I reserve the right to tease you mercilessly.

That all being said, I will try to post lecture videos online so that you can catch up on what you missed.

Academic Integrity

Students may discuss and advise one another on their projects, but groups are expected to turn in their own work.  Discussing concepts is permitted.  Referencing code (e.g., from another group, or from stack overflow or github) is not. When in doubt, ask a TA or the instructor. Violations typically result in an F grade in the course for all students involved.  The Department's policy on academic integrity can be reviewed at:

UB-CSE's Academic Integrity Policy

Medical Emergencies

Accommodations for medical emergencies will be made on a case-by-case basis.  Requests for extensions based on medical emergencies must be accompanied by documentation of the emergency from student health services:

Student Health Services

Accessibility Resources

If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments, please advise the instructor and contact the Office of Accessibility Resources during the first two weeks of the course so that we may review possible arrangements for reasonable accommodations. The Office of Accessibility Resources can be reached at:

The Office of Accessibility Resources | (716) 645-2608 | accessibility@buffalo.edu

Note that OAR gets extremely busy towards the end of the term. If you anticipate needing special testing accomodations, please make sure to contact them 2-4 weeks in advance.

This page last updated 2019-06-28 15:47:51 -0400