CSE-4/562 Database Systems (Spring 2021)
Data Management Systems (including Relational Databases, Non-Relational Databases, and NoSQL storage systems) form the basis of the Big Data Economy we now live in. A data management system is responsible for storing data, enabling efficient access to that data, as well as mediating concurrent modifications. This class approaches the challenges of designing a data management system from a standpoint that is both principled and practical. The course revolves around a term-long programming assignment, in which you will build a system that answers SQL queries efficiently. Course lectures will focus on the conceptual basis for this system, and will discuss how the techniques you learn generalize (e.g., to the use of NoSQL systems)
In this course, you will learn...
- ... how to efficiently store and retrieve data programatically.
- ... how to optimize big-data computations.
- ... how to use index structures to accelerate computations.
- ... how to safely and efficiently manipulate data concurrently.
- ... how to recover state after software and hardware failures.
- ... how to query and update distributed data consistently.
- Class T/R 12:45-2:00 PM on YouTube
- Course Discussions: Piazza
- No Required Textbook
- Optional Database Concepts References:
- Optional SQL References:
- "SAMS Teach Yourself SQL in 10 Minutes" 4e. by Ben Forta
- Optional Scala References:
- Homework Submission: Autolab
- Project Submission: Autolab
- Git Repository Management: Microbase
- 50% theory
- 20% Group Homeworks (Group size: 1-4; Skip any 4 submissions for any reason)
- 20% Comprehensive Final (see HUB for time/location)
- Extra 10% for the better of the above two
- 50% projects (Group size: 1)
Extended Relational Algebra
Physical Data Layout
Indexes: Tree-Based, Hash
Indexes: View-Based, Modern
Due: Checkpoint 1
Spark's Optimizer + Checkpoint 2
Cost-Based Optimization (contd.)
Distributed Queries: Challenges + Partitioning
Distributed Queries: Semi + Bloom Join
Due: Checkpoint 2
Aggregation + Checkpoint 3
Online Aggregation/Approximate Query Processing
Data Updates + Incremental View Maintenance
Due: Checkpoint 3
Indexing Review + Checkpoint 4
Transactions: Intro + Concepts
Logging + Recovery
Distributed Commit (contd.)
I expect students in this class to show respect for each other and themselves. This includes, but is not limited to the following forms of respect:
- Respect each other's humanity
- Especially with us not meeting in person, it's easy to lose track of the fact that the folks you're interacting with (fellow students, TAs, and everyone else others) are humans too. Think how what you're saying will be interpreted before you speak/write. Avoid insulting language, and focus on the merits of the ideas being discussed. Avoid dismissing ideas outright (if you can't come up with a good counterargument, maybe it's not actually a bad idea?)
- Respect each other's intent
- Especially given how bad text is at communicating emotion, try to avoid assuming that others are attacking you personally. Try to view what others are saying to you in the best possible light. Always ask for clarification before you get angry.
- Respect yourself and your limits
- Most students in 4/562 put in a lot of work on this course, so unsurprisingly, occasionally students decide that the course is too much work. If and when this happens to you, talk to me or another member of the course staff. It may be something as simple as you spacing out and missing a critical bit of some lecture, and armed with this information you can proceed to ace the class! Maybe the course actually is too much work for you this semester, in which case we'll still be able to come up with some strategy that lets you move forward with your education. Whatever the case, talk to me or course staff and we will figure something out.
- Respect each other's effort
- The flip side of this is that since colleagues are putting in the effort, you should do the same. Do not unilaterally decide that you do not have to do the same work that they are. Do not copy code/answers from the internet or other students. If you do not complete the same work as your classmates, do not expect to earn the same grade.
Students may discuss and advise one another on their lab projects, but groups are expected to turn in their own work. Discussing concepts is permitted. Referencing another group's code is not. Cheating on an exam or project submission will result in an grade of F in the course for all involved. It is the CSE department's policy not to provide financial support to any student disciplined for plagarism. University policies on academic integrity can be reviewed at:
CSE Departmental Policy on Academic Integrity
UB's University-Wide Undergraduate Academic Integrity Policy
The Graduate School Policy Library
Accommodations for medical emergencies will be made on a case-by-case basis. Requests for extensions based on medical emergencies must be accompanied by documentation of the emergency from student health services:
Student Health Services
If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments, please advise the instructor during the first two weeks of the course so that we may review possible arrangements for reasonable accommodations. In addition, if you have not yet done so, contact:
The Office of Accessibility Resources.