Which Algorithms Really Matter

Which Algorithms Really Matter?

©MapR Technologies 2013

1

Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG



MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s



Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR


2

Topic For Today


What is important? What is not?



Why?



What is the difference from academic research?



Some examples


4

What is Important?


Deployable



Robust



Transparent



Skillset and mindset matched?



Proportionate


5

What is Important?


Deployable
–

Clever prototypes don’t count if they can’t be standardized



Robust



Transparent






Proportionate


6

What is Important?


Deployable
–



Robust
–



Clever prototypes don’t count
Mishandling is common

Transparent
–

Will degradation be obvious?






Proportionate


7

What is Important?


Deployable
–



Robust
–



Will degradation be obvious?

–



Mishandling is common

Transparent
–



Clever prototypes don’t count

How long will your fancy data scientist enjoy doing standard ops tasks?

Proportionate
–

Where is the highest value per minute of effort?


8

Academic Goals vs Pragmatics


Academic goals
–
–

–



Reproducible
Isolate theoretically important aspects
Work on novel problems

Pragmatics
–
–
–
–
–

Highest net value
Available data is constantly changing
Diligence and consistency have larger impact than cleverness
Many systems feed themselves, exploration and exploitation are both
important
Engineering constraints on budget and schedule


9

Example 1:
Making Recommendations Better


10

Recommendation Advances


What are the most important algorithmic advances in
recommendations over the last 10 years?



Cooccurrence analysis?



Matrix completion via factorization?



Latent factor log-linear models?



Temporal dynamics?


11

The Winner – None of the Above


What are the most important algorithmic advances in
recommendations over the last 10 years?

1. Result dithering
2. Anti-flood


12

The Real Issues


Exploration



Diversity



Speed



Not the last fraction of a percent


13

Result Dithering


Dithering is used to re-order recommendation results
–

Re-ordering is done randomly



Dithering is guaranteed to make off-line performance worse



Dithering also has a near perfect record of making actual
performance much better


14

Result Dithering


Dithering is used to re-order recommendation results
–

Re-ordering is done randomly



Dithering is guaranteed to make off-line performance worse



Dithering also has a near perfect record of making actual
performance much better

“Made more difference than any other change”

15

Simple Dithering Algorithm


Generate synthetic score from log rank plus Gaussian

s = logr + N(0, e )


Pick noise scale to provide desired level of mixing

Dr µ r exp e


Typically

e Î [ 0.4, 0.8]


Oh… use floor(t/T) as seed


16

Example … ε = 0.5
1
1
1
1
1
1
1
2
4
2
3
2

2
2
4
2
6
2
2
1
1
1
1
1

6
3
3
4
2
3
3
3
2
5
5
3

5
8
2
3
3
5
4
5
7
3
4
4

3
5
6
15
4
24
6
7
3
4
2
7
17

4
7
7
7
16
7
12
6
9
7
7
12

13
6
11
13
9
17
5
4
8
13
8
17

16
34
10
19
5
13
14
17
5
6
6
16

Example … ε = log 2 = 0.69
1
1
1
1
1
1
1
2
2
3
11
1

2
8
3
2
5
2
3
4
3
4
1
8

8
14
8
10
33
7
5
11
1
1
2
7

3
15
2
7
15
3
23
8
4
2
4
3

9
3
10
3
2
5
9
3
6
10
5
22
18

15
2
5
8
9
4
7
1
7
11
7
11

7
22
7
6
11
19
4
44
8
15
3
2

6
10
4
14
29
6
2
9
33
14
14
33

Exploring The Second Page


19

Lesson 1:
Exploration is good


20

Example 2:
Bayesian Bandits


21

Bayesian Bandits


Based on Thompson sampling



Very general sequential test



Near optimal regret



Trade-off exploration and exploitation



Possibly best known solution for exploration/exploitation



Incredibly simple


22

Thompson Sampling


Select each shell according to the probability that it is the best



Probability that it is the best can be computed using posterior

é
ù
P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq
ë
û
j


But I promised a simple answer


23

Thompson Sampling – Take 2


Sample θ

q ~ P(q | D)


Pick i to maximize reward

i = argmax E[rj | q ]
j



Record result from using i


24

Fast Convergence
0.12
0.11
0.1
0.09
0.08

regret

0.07
0.06

ε- greedy, ε = 0.05
0.05
0.04

Bayesian Bandit with Gam m a- Norm al

0.03
0.02
0.01
0
0

100

200

300

400

500

600
n


25

700

800

900

1000

1100

Thompson Sampling on Ads

An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011

26

Bayesian Bandits versus Result Dithering


Many useful systems are difficult to frame in fully Bayesian form



Thompson sampling cannot be applied without posterior sampling



Can still do useful exploration with dithering



But better to use Thompson sampling if possible


27

Lesson 2:
Exploration is pretty
easy to do and pays
big benefits.


28

Example 3:
On-line Clustering


29

The Problem


K-means clustering is useful for feature extraction or compression



At scale and at high dimension, the desirable number of clusters
increases



Very large number of clusters may require more passes through
the data



Super-linear scaling is generally infeasible


30

The Solution


Sketch-based algorithms produce a sketch of the data



Streaming k-means uses adaptive dp-means to produce this sketch
in the form of many weighted centroids which approximate the
original distribution



The size of the sketch grows very slowly with increasing data size



Many operations such as clustering are well behaved on sketches

Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.


31

An Example


32

An Example


33

The Cluster Proximity Features


Every point can be described by the nearest cluster
–
–



Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)
–
–



4.3 bits per point in this case
Significant error that can be decreased (to a point) by increasing number of
clusters

Error is negligible
Unwinds the data into a simple representation

Or we can increase the number of clusters (n fold increase adds log
n bits per point, decreases error by sqrt(n)


34

Diagonalized Cluster Proximity


35

Lots of Clusters Are Fine


36

Typical k-means Failure

Selecting two seeds
here cannot be
fixed with Lloyds
Result is that these two
clusters get glued
together


37

Streaming k-means Ideas


By using a sketch with lots (k log N) of centroids, we avoid
pathological cases



We still get a very good result if the sketch is created
–
–

in one pass
with approximate search



In fact, adaptive dp-means works just fine



In the end, the sketch can be used for clustering or …


38

Lesson 3:
Sketches make big
data small.


39

Example 4:
Search Abuse


40

Recommendations

Alice

Charles


Alice got an apple and a
puppy

Charles got a bicycle

41

Recommendations

Alice

Bob

Charles


Alice got an apple and a
puppy

Bob got an apple

Charles got a bicycle

42

Recommendations

Alice

Bob

?

What else would Bob like?

Charles


43

Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob

44

History Matrix: Users by Items

Alice

✔

Bob

✔

Charles


✔

✔
✔
✔

45

✔

Co-occurrence Matrix: Items by Items
How do you tell which co-occurrences are useful?.

1

2

1

1

2


1

0

-

0

1

1
46

0
0

Co-occurrence Binary Matrix

not
not


1
1

47

1

Indicator Matrix: Anomalous Co-Occurrence
Result: The marked row will be added to the indicator
field in the item document…

✔

✔


48

Indicator Matrix
That one row from indicator matrix becomes the indicator field in the Solr
document used to deploy the recommendation engine.

✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:

(t1)

Note: data for the indicator field is added directly to meta-data for a document in
Solr index. You don’t need to create a separate index for the indicators.

49

Internals of the Recommender Engine

50


50

Internals of the Recommender Engine

51


51

Looking Inside LucidWorks
Real-time recommendation query and results: Evaluation

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
52


52

Real-life example


53

Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search


54

Summary


55

Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG



MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s



Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR


57

Which Algorithms Really Matter

More Related Content

What's hot (20)

Similar to Which Algorithms Really Matter (20)

More from Ted Dunning (20)

Recently uploaded (20)

Which Algorithms Really Matter

Editor's Notes