Ying could not be here today. If you like her ideas, get in touch with her.
(If you don't, blame my presentation)
(Also, Ying is on the job market)
(not immediately relevant to the talk, but you should check it out)
Ying: To implement {cool feature in Mimir}, we'll need to be able to perform inference on Graphical Models, but we will not know how complex they are.
Joint probability distributions are expensive to store
$$p(D, I, G, S, J)$$
Bayes rule lets us break apart the distribution
$$= p(D, I, G, S) \cdot p(J | D, I, G, S)$$
And conditional independence lets us further simplify
$$= p(D, I, G, S) \cdot p(J | G, S)$$
This is basis for a type of graphical model called a "Bayes Net"
$p(D=1, I=0, S=0, G=2, J=1)$
$=\;0.5$
$\cdot\;0.7$
$\cdot\;0.95$
$\cdot\;0.25$
$\cdot\;0.8$
$=\;0.0665$
$p(D,I,S,G,J)$
$=$
$p(D) \bowtie p(I) \bowtie p(S|I) \bowtie p(G|D,I) \bowtie p(J|G,S)$
$p(J) = \sum_{D,I,S,G}p(D,I,S,G,J)$
(aka the computing the marginal probability)Key Challenge: For {really cool feature} we don't know whether we should use exact or approximate inference.
Can we gracefully degrade from exact to approximate inference?
SELECT J.J, SUM(D.p * I.p * S.p * G.p * J.p) AS p
FROM D NATURAL JOIN I NATURAL JOIN S
NATURAL JOIN G NATURAL JOIN J
GROUP BY J.J
(Inference is essentially a big group-by aggregate join query)
(Variable elimination is Aggregate Pushdown + Join Ordering)
$Avg(3,6,10,9,1,3,9,7,9,4,7,9,2,1,2,4,10,8,9,7) = 6$
$Avg(3,6,10,9,1) = 5.8$ $\approx 6$
$Sum\left(\frac{k}{N} Samples\right) \cdot \frac{N}{k} \approx Sum(*)$
Sampling lets you approximate aggregate values with orders of magnitude less data.
Classical OLA techniques aren't entirely appropriate.
If you pick $a$, $b$, and $N$ correctly, then the sequence:
$K_i = (a\cdot K_{i−1}+b)\;mod\;N$
will produce $N$ distinct, pseudorandom integers $K_i \in [0, N)$
To marginalize $p(\{X_i\})$...
Make Cyclic Sampling into a composable operator
$G$ | # | $\sum p_{\psi_2}$ |
---|---|---|
1 | 1 | 0.126 |
$I$ | $G$ | # | $\sum p_{\psi_1}$ |
---|---|---|---|
0 | 1 | 1 | 0.18 |
$G$ | # | $\sum p_{\psi_2}$ |
---|---|---|
1 | 3 | 0.348 |
2 | 4 | 0.288 |
3 | 4 | 0.350 |
$I$ | $G$ | # | $\sum p_{\psi_1}$ |
---|---|---|---|
0 | 1 | 2 | 0.140 |
1 | 1 | 2 | 0.222 |
0 | 2 | 2 | 0.238 |
1 | 2 | 2 | 0.050 |
0 | 3 | 2 | 0.322 |
1 | 3 | 2 | 0.028 |
$G$ | # | $\sum p_{\psi_2}$ |
---|---|---|
1 | 4 | 0.362 |
2 | 4 | 0.288 |
3 | 4 | 0.350 |
$I$ | $G$ | # | $\sum p_{\psi_1}$ |
---|---|---|---|
0 | 1 | 2 | 0.140 |
1 | 1 | 2 | 0.222 |
0 | 2 | 2 | 0.238 |
1 | 2 | 2 | 0.050 |
0 | 3 | 2 | 0.322 |
1 | 3 | 2 | 0.028 |
There's a bit of extra math to compute $\epsilon-\delta$ bounds by adapting Serfling's results. It's in the paper.
Student: A common benchmark graph.
VE is binary: It completes, or it doesn't.
CS gets early results faster, but is overtaken by LJ.
LJ is only 3-5x slower than VE.
LJ converges to an exact result before Gibbs gets an approx.
On some graphs Gibbs is better, but only marginally.
Questions?