Spring 2015
Data Management Systems (including Relational Databases, Non-Relational Databases, and NoSQL storage systems) form the basis of the Big Data Economy we now live in. A data management system is responsible for storing data, enabling efficient access to that data, as well as mediating concurrent modifications. This class approaches the challenges of designing a data management system from a standpoint that is both principled and practical. The course revolves around a term-long programming assignment, in which you will build a system that answers SQL queries efficiently. Course lectures will focus on the conceptual basis for this system, and will discuss how the techniques you learn generalize (e.g., to the use of NoSQL systems)
In this course, you will learn...
- ... how to efficiently store and retrieve data programatically.
- ... how to optimize big-data computations.
- ... how to use index structures to accelerate computations.
- ... how to safely and efficiently manipulate data concurrently.
- ... how to recover state after software and hardware failures.
- ... how to query and update distributed data consistently.
Course Details
- Class: M/W/F, 12:00-12:50 PM in NSC 201
- Class Forum: Piazza
- Textbook: "Database Systems, The Complete Book" 2nd Edition by Garcia-Molina, Ullman and Widom.
- Instructor: Oliver Kennedy (Davis 338H, Wed 1:00-3:00)
- TAs
- Vishrawas Gopalakrishnan (TA Lounge, Mon 2:00-4:00)
- Ning Deng (TA Lounge, Tue 9:00-11:00)
- Project Submission: http://dubstep.odin.cse.buffalo.edu
- Project Groups: 1-3 people
- Grading:
- 50% exams
- 15% Midterm 1 on Mar. 4 (in class)
- 15% Midterm 2 on Apr. 8 (in class)
- 20% Comprehensive Final on Thu May 14 (4:00-6:30)
- 50% projects
Library Documentation
Lecture Schedule
- Jan. 26: Intro and Outline (Slides, Video)
- Jan. 28: Relational Algebra 1/2 (Slides, Video, Example DB)
- Jan 30: Relational Algebra 2/2 (Slides, Video)
- Feb. 2: SQL (Slides, Video)
- Feb. 4: Translating SQL to Relational Algebra (Slides)
- Feb. 6: Evaluating Relational Algebra (Slides, Video)
- Feb. 9: Extended Relational Algebra (Slides, Video)
- Feb. 11: Project 1 Review (Slides, Video)
- Feb. 13: Data Modeling - The E/R Model (Slides, Video)
- Feb 16: Data Modeling - Constraints (Slides, Video)
- Feb 18: Query Optimization (Slides, Video)
- Feb 20: Physical Design (Slides, Video)
- Feb 23: Indexes (Slides, Video)
- Feb 25: Join Algorithms (Slides, Video)
- Feb 27: Out-of-Core Algorithms (Slides, Video)
- Mar 2: Midterm 1 Review (Slides, No Video)
- Mar 4: Midterm 1 (Solutions)
- Mar 6: Project 2 Review (Slides, Video)
- Mar 9: Cost-Based-Optimization (Slides, Video)
- Mar 11: Cost-Based-Optimization (Slides, No Video)
- Mar 13: Storage/Serialization (Slides, Video)
- Mar 16-20: Spring Break!
- Mar 23: Transactional Correctness (Slides)
- Mar 25: Locking (Slides)
- Mar 27: Deadlock Management (Slides, Video)
- Mar 30: Optimistic Concurrency Control (Slides)
- Apr 1: Project 3 Review (Slides)
- Apr 3: Logging (Slides)
- Apr 6: Midterm 2 Content Review (Slides)
- Apr. 8: Midterm 2 (Solutions)
- Apr 10: The ARIES Protocol (Slides)
- Apr 13: Views (Slides)
- Apr 15: Incremental View Maintenance (Slides)
- Apr 17: Parallelism Fundamentals (Slides)
- Apr 20: Semi-Join & Bloom Join (Slides)
- Apr 22: Bloom, Updates & CAP (Slides)
- Apr 24: 2-Phase Commit (Slides)
- Apr 27: Replica Consistency (Slides)
- Apr 29: Data Warehousing (Slides)
- May 1: JITDs (Slides, More Info)
- May 4: Final Review 1 (Slides)
- May 6: Final Review 2 (Slides)
- May 8: Final Review 3 (Slides)
- May 14: Final Exam 4 PM
Content Outline
- Project 0 - Basic Setup
- Project 1 - Infrastructure & Evaluation
- Relational Algebra (Ch 2.4, 5.1)
- SQL (Ch 2.3 and 6.1-6.4)
- Query Compilers (Ch 15.1-15.3, 16.1, 16.3)
- Data Modeling (Ch 2.1-2.2)
- Project 2 - Optimization & External Algorithms
- Algebraic Query Optimization (Ch 16.2)
- Join Algorithms (Ch 15.4, 15.5)
- Extended Relational Algebra (Ch 5.2)
- Buffering & External Algorithms (Ch 15.7-15.8)
- Physical Plans (Ch 16.7)
- Project 3 - Indexing & Physical Layout
- The Memory Hierarchy (Ch 13.1-13.3)
- Physical Design (Ch 13.5-13.7)
- Indexing (Ch 8.3, 14.1-14.4)
- Materialized Views (Ch 8.1-8.2, 8.5)
- Cost-Based Optimization (Ch 8.4, 16.4-16.6)
- Concepts (No Project)
- Failure Recovery (Ch 13.4, 19.1, 19.3)
- Updating Data (Ch 6.5, 13.8)
- Transactions (Ch 6.6, 18.1-18.2)
- Locking (Ch 18.3-18.7)
- Deadlocks (Ch 19.2)
- Lock-free Concurrency (Ch 18.8-18.9)
- Distributed Data Management (Ch 20)
- Uncertain Data Management
- Time permitting, other subjects will also be covered.
Academic Integrity
Students may discuss and advise one another on their lab projects, but groups are expected to turn in their own work. Discussing concepts is permitted. Referencing another group's code is not. Cheating on any course deliverable will result in an automatic grade of F in the course. The University's policy on academic integrity can be reviewed at:
The Graduate School Academic Integrity Policy
Medical Emergencies
Accommodations for medical emergencies will be made on a case-by-case basis. Requests for extensions based on medical emergencies must be accompanied by documentation of the emergency from student health services:
Student Health Services
Accessibility Resources
If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments, please advise the instructor during the first two weeks of the course so that we may review possible arrangements for reasonable accommodations. In addition, if you have not yet done so, contact:
The Office of Accessibility Resources.