CSE-4/562 Spring 2019 - Physical Layout & Memory Management

Physical Layout & Memory Management

CSE-4/562 Spring 2019

February 15, 2019

Textbook: Ch. 13.1-13.7, 15.7, 16.7
'08/27/2015'180683348711'POINT (-73.84421521958048 40.723091773924274)'30'OnCurb''Alive''Fair''Acer rubrum''red maple''None''None''NoDamage''TreesCount Staff''None''No''No''No''No''No''No''No''No''No''108-005 70 AVENUE''11375''Forest Hills'4064'Queens'292816'QN17''Forest Hills'4073900'New York'40.72309177-73.844215221027431.14821202756.768749
'09/03/2015'200540315986'POINT (-73.81867945834878 40.79411066708779)'210'OnCurb''Alive''Fair''Quercus palustris''pin oak''None''None''Damage''TreesCount Staff''Stones''Yes''No''No''No''No''No''No''No''No''147-074 7 AVENUE''11357''Whitestone'4074'Queens'192711'QN49''Whitestone'4097300'New York'40.79411067-73.818679461034455.70109228644.837379
'09/05/2015'204026218365'POINT (-73.93660770459083 40.717580740099116)'30'OnCurb''Alive''Good''Gleditsia triacanthos var. inermis''honeylocust''1or2''None''Damage''Volunteer''None''No''No''No''No''No''No''No''No''No''390 MORGAN AVENUE''11211''Brooklyn'3013'Brooklyn'345018'BK90''East Williamsburg'3044900'New York'40.71758074-73.93660771001822.83131200716.891267
'09/05/2015'204337217969'POINT (-73.93445615919741 40.713537494833226)'100'OnCurb''Alive''Good''Gleditsia triacanthos var. inermis''honeylocust''None''None''Damage''Volunteer''Stones''Yes''No''No''No''No''No''No''No''No''1027 GRAND STREET''11211''Brooklyn'3013'Brooklyn'345318'BK90''East Williamsburg'3044900'New York'40.71353749-73.934456161002420.35833199244.253136
'08/30/2015'189565223043'POINT (-73.97597938483258 40.66677775537875)'210'OnCurb''Alive''Good''Tilia americana''American linden''None''None''Damage''Volunteer''Stones''Yes''No''No''No''No''No''No''No''No''603 6 STREET''11215''Brooklyn'3063'Brooklyn'394421'BK37''Park Slope-Gowanus'3016500'New York'40.66677776-73.97597938990913.775046182202.425999


Record Layouts

How is data stored?

Problem 1: How should you encode one tuple?

Record Layout 1: Fixed

Record Layout 2: Delimiters

Record Layout 3: Headers

Record Formats

Constant-size fields. Field $i$ at byte $\sum_{j < i} |Field_j|$
Special character or string (e.g., ,) between fields
Fixed-size header points to start of each field

Problem 2: How should you encode a file of tuples?

File Formats

Constant-size records. Record $i$ at byte $|Record| \times i$
Special character or string (e.g., \r\n) at record end
Index in file points to start of each record
Align records to paging boundaries
A collection of pages (or records)
A fixed-size collection of records
Page size is usually dictated by hardware.
Mem Page $\approx$ 4KB   Cache Line $\approx$ 64B
One or more fields (for now)
A primitive value (for now)

Problem 2.b: How should you store records in a page?

Goal 1: Where is record $X$?

Goal 2: Support updates/deletions

Fixed size records

What about variable-size records?

Why store the key and records from opposite ends?

Problem 3: How should you organize pages in a file?

Key question: What happens when all records on a page are deleted?

Idea: Track empty pages.

An Alternative Layout

Row-Wise Layouts

Column-Wise Layouts

Each file stores 2-tuples $\left< RowID, Value\right >$.

Values only for one attribute.


  • Only one attribute to sort per file.
  • No IO cost for unused attributes ($\pi$-pushdown!)


  • Result attributes must be stitched back together ($\bowtie$)

Great for wide, rarely-updated tables where only a few attributes are used per-query

Example Column Stores

Cassandra logo Vertica pos blk rgb.svg MonetDB logo

By Apache Software Foundation - https://svn.apache.org/repos/asf/cassandra/logo/cassandra.svg, Apache License 2.0, Link
By Ariolica - Own work, CC BY-SA 4.0, Link
By Source (WP:NFCC#4), Fair use, Link

Buffer Manager

Abstract the messy details of File-IO

A "slot" managed by the buffer manager that holds one page.
Pinned Page
A page currently in use by part of the database. Must stay in its current frame until unpinned. (A page may be pinned multiple times)
Dirty Page
A page that has been modified since it was last read in.

When a page is requested

Is the page in the buffer pool?

  • Yes? Pin the page (again) and return the address.
  • No?
    • Pick a frame for replacement with your favorite algorithm (e.g., LRU)...
    • If the frame is dirty, write it to disk
    • Read requested page into chosen frame
    • Pin the page and return its address

Does this all sound familiar?

Isn't this just Virtual Memory?


(Many databases use memory-mapped files as a buffer manager)

Why Re-implement VMem?

Databases can predict the future!

SELECT * FROM R WHERE A > 500 AND A < 2000   →    Pages 10-12

How do we decide which pages hold the query results?

Answers next class!