Pocket Data

(What's in your pocket?)

Jerry Ajay
Oliver Kennedy, Geoff Challen, Luke Ziarek

Big Data!

  • GB, TB, or PB of Data!
  • Hundreds of thousands of updates per second!
  • Thousands of nodes computing together!
  • "Virtually" infinite resources!

Everyone Loves Big Data!

Everyone Loves Measuring Big Data!


TPC-C TPC-H SSB TPC-DS TPC-DI PolePosition YCSB TPC-E http://openclipart.org

Big Data is Awesome!

The Lego Movie (Warner Brothers)

This talk is not about big data.

The average smartphone evaluates
about 180 queries per day

That's about 2 queries every second.

  • Is this big data? No
  • Is this important? Yes!
    • Multi-Tenency: The phone is more than just a DB.
    • Power: 1-2 days of battery life under ideal circumstances.
    • It’s Everywhere: Odds are that your phone is running some queries right now!

We need to understand Pocket-Scale Data!

SQLite

  • Embedded: SQLite is a library
  • Un-shared: SQLite DBs are specific to one client “app”.
  • Lightweight: Entire SQLite DB is backed to one file.
  • Universal: SQLite client library is available by default in nearly all major OSes.
  • “Easy”: Duck Typing, Relaxed SQL Syntax, One Big Lock (file)

How do smartphone devs use SQLite?

PhoneLab

A Smartphone Platform Testbed

~200 UB students, faculty, and staff using instrumented LG Nexus 5 smartphones in exchange for discounted service.

PhoneLab

A Smartphone Platform Testbed
  • Preliminary Trial: 11 phones for ~1 month (254 phone/days)
  • Instrumented SQLite logs all statements (~45 mil statements)
    • ~33.5 million SELECT statements
    • ~9.4 million INSERT statements
    • ~1 million UPDATE statements
    • ~1.2 million DELETE statements
    • 179 distinct ‘apps’ issuing statements

https://phone-lab.org/experiment/request

What did we see in the data?


http://wrathofzombie.wordpress.com/

No, seriously... what did you see?

  • SELECTs ranging the complexity spectrum
  • ORMs being stupid
  • Lots of aggregation
  • Workloads that were a little too read heavy
  • ... And some very odd inter-query timings
  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

SELECT Complexity

30 million simple "SPA" queries Infrequent, but extremely complex queries

SELECT Complexity

24 / 179 apps using SQLite only as a K/V Store

SELECT Complexity


	  INSERT OR REPLACE INTO 
	      properties(property_key,property_value)
	      VALUES (?,?);
				

	  SELECT property_value 
	      FROM properties 
	      WHERE property_key=?;
				
(These are actual queries from the trace)

SELECT Complexity

80% of selects return one row Small % of SELECTs return 100s of rows

SELECT Complexity

Google+, GMail, Facebook, Contacts
  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

pers = Persons.get(10)
name = pers.firstName()
				

SELECT first_name
FROM Persons
WHERE id = 10;
				

SQL DB persists objects between app runs


pers = Persons.get(10)
org = pers.employer()
name = org.name()
				

SELECT employer_id
FROM Persons
WHERE id = 10;
				

SELECT name
FROM Organizations
WHERE id = ?;
				

ORMs are not always efficient


pers = Persons.get(10)
pers.setSalary(
  pers.salary() * 1.1
)				

SELECT salary
FROM Persons
WHERE id = 10;
				

UPDATE Persons
SET salary = ?
WHERE id = 10;
				

We saw NO update value computations in SQL


pers = Persons.get(10)
pers.setSalary(
  pers.salary() * 1.1
)				

SELECT salary
FROM Persons
WHERE id = 10;
				

INSERT OR REPLACE Persons
SET salary = ?
WHERE id = 10;
				

"Insert or Replace" used very frequently

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

Aggregate Functions

FunctionCall Sites
GROUP_CONCAT583,474
SUM321,387
MAX314,970
COUNT173,031
MIN19,566
AVG15

Aggregates by-far the most common function type

Aggregate Functions

FunctionCall Sites
GROUP_CONCAT583,474
SUM321,387
MAX314,970
COUNT173,031
MIN19,566
AVG15

Non-algebraic column-wise string concatenation

Other Functions


Mostly string manipulation (length, substr)


Some Android-Specific (phone_numbers_equal)


NO UDFs at all

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

Reads vs Writes

~15% of apps do not perform a single write! only ~15% of apps write more than they read!

Read-Only Workloads


JuiceSSH, Key Chain

  • Credential store, infrequent writes


Google Play Newsstand, Eventbrite, ...

  • Frequent queries over changing data
  • Data bulk updated by copying entire SQLite DB
  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

Query Arrival Frequency

15-20% of queries arrive ~10ms after last query.

... By Query Type

70% of inserts come less than 0.1 ms before another query Most sequences consist of INSERTs and SELECTs

... By App

Google Play Services and Media Storage are bursty Same 10 ms periodicity evident across all apps

Overview

  • Mobile phones process ~2 queries/second
    • DB performance important for power, latency, ...
  • Embedded DBs used differently than Server DBs.
    • We need to understand these access patterns before we can optimize for them.

What's Next?

  • Bigger Sample Sizes - All 200 participants?
  • Better Instrumentation - Application Code, Filesystems?
  • Semantic Analysis - What are the queries actually asking?
  • ORM Integration - Opportunities for improvement?
  • The PocketData Benchmark

The PocketData Benchmark

  • Modular: Apps as bags of persistent datastructures.
  • Testing under normal load rather than saturation.
  • Metrics like power & cpu/memory resource use.
  • Accounting for background resource use.
  • Mixed-bag workloads (Key/Value+OLAP+OLTP).
  • Evaluate Best Execution of Programmer Intent.

You Can Help!

Let us know how you would use PocketData!

okennedy@buffalo.edu