Cmu-2011-09.pptx

MapR,
Implica0ons
for
Integra0on

CMU
–
September
2011

10/11/11
©
MapR
Conﬁden0al
1

Outline

•  MapR
system
overview

•  Map-‐reduce
review

•  MapR
architecture

•  Performance
Results

on
MapR

•  Architectural
implica0ons

•  Search
indexing
/
deployment

•  EM
algorithm
for
machine
learning

•  …
and
more
…

10/11/11
©
MapR
Conﬁden0al
2

!"!#

Map-‐Reduce

!" @/-,9) !#
A.0B

Input
Output

!" @/-,9) !#
A.0B

Shuﬄe

$%&'()"*" +,&)!'%-(./%0) 12'!!3)"*4 536'-3) 8'(&'()"930)
10/11/11
"*# ©
MapR
Conﬁden0al
!'%-(./%0) "*: 3

"*7

BoQlenecks
and
Issues

•  Read-‐only
files

•  Many
copies
in
I/O
path

•  Shuffle
based
on
HTTP

•  Can’t
use
new
technologies

•  Eats
file
descriptors

•  Spills
go
to
local
file
space

•  Bad
for
skewed
distribu0on
of
sizes

10/11/11
©
MapR
Confiden0al
4

MapR
Areas
of
Development

HBase
Map

Reduce

Ecosystem

Storage

Management

Services

10/11/11
©
MapR
Conﬁden0al
5

MapR
Improvements

•  Faster
file
system

•  Fewer
copies

•  Mul0ple
NICS

•  No
file
descriptor
or
page-‐buf
compe00on

•  Faster
map-‐reduce

•  Uses
distributed
file
system

•  Direct
RPC
to
receiver

•  Very
wide
merges

10/11/11
©
MapR
Confiden0al
6

MapR
Innova0ons

•  Volumes

•  Distributed
management

•  Data
placement

•  Read/write
random
access
ﬁle
system

•  Allows
distributed
meta-‐data

•  Improved
scaling

•  Enables
NFS
access

•  Applica0on-‐level
NIC
bonding

•  Transac0onally
correct
snapshots
and
mirrors

10/11/11
©
MapR
Conﬁden0al
7

MapR's
Containers

Files/directories
are
sharded
into
blocks,
which

are
placed
into
mini
NNs
(containers
)
on
disks

l  Each
container
contains

l  Directories
&
ﬁles

l  Data
blocks

l  Replicated
on
servers

Containers
are

l  No
need
to
manage

16-‐32
GB
segments

directly

of
disk,
placed
on

nodes

10/11/11
©
MapR
Conﬁden0al
8

MapR's
Containers

l  Each
container
has
a

replica0on
chain

l  Updates
are
transac0onal

l  Failures
are
handled
by

rearranging
replica0on

10/11/11
©
MapR
Conﬁden0al
9

Container
loca0ons
and
replica0on

N1,
N2
N1

N3,
N2

N1,
N2

N1,
N3
N2

N3,
N2

CLDB

N3

Container
loca0on
database

(CLDB)
keeps
track
of
nodes

hos0ng
each
container
and

replica0on
chain
order

10/11/11
©
MapR
Conﬁden0al
10

MapR
Scaling

Containers
represent
16
-‐
32GB
of
data

l  Each
can
hold
up
to

1
Billion
files
and
directories

l  100M
containers
=

~
2
Exabytes

(a
very
large
cluster)

250
bytes
DRAM
to
cache
a
container

l  25GB
to
cache
all
containers
for
2EB
cluster

  But
not
necessary,
can
page
to
disk

l  Typical
large
10PB
cluster
needs
2GB

Container-‐reports
are
100x
-‐
1000x

<

HDFS
block-‐reports

l  Serve
100x
more
data-‐nodes

l  Increase
container
size
to
64G
to
serve
4EB
cluster

l  Map/reduce
not
affected

10/11/11
©
MapR
Confiden0al
11

MapR's
Streaming
Performance

2250 2250
11
x
7200rpm
SATA
11
x
15Krpm
SAS

2000 2000
1750 1750
1500 1500
1250 1250 Hardware
MapR
1000 1000
MB
Hadoop
750 750
per

sec
500 500
250 250
0 0
Read Write Read Write
Higher
is
be;er

Tests:

i.

16
streams
x
120GB

ii.

2000
streams
x
1GB

10/11/11
©
MapR
Conﬁden0al
12

Terasort
on
MapR

10+1
nodes:
8
core,
24GB
DRAM,
11
x
1TB
SATA
7200
rpm

60 300

50 250

40 200

Elapsed
30 150
MapR
=me
Hadoop
(mins)
20 100

10 50

0 0
1.0
TB 3.5
TB

Lower
is
be;er

10/11/11
©
MapR
Conﬁden0al
13

HBase
on
MapR

YCSB
Random
Read

with
1
billion
1K
records

10+1
node
cluster:
8
core,
24GB
DRAM,
11
x
1TB
7200
RPM

25000

20000

Records
15000

per
MapR

second

10000
Apache

5000

0

Zipﬁan
Uniform
Higher
is
be;er

10/11/11
©
MapR
Conﬁden0al
14

Small
Files
(Apache
Hadoop,
10
nodes)

Out
of
box

Op:

-‐
create
file

Rate (files/sec)

-‐
write
100
bytes

Tuned

-‐
close

Notes:

-‐
NN
not
replicated

-‐
NN
uses
20G
DRAM

-‐
DN
uses

2G

DRAM

#
of
files
(m)

10/11/11
©
MapR
Confiden0al
15

MUCH
faster
for
some
opera0ons

Same
10
nodes
…

Create

Rate

#
of
ﬁles
(millions)

10/11/11
©
MapR
Conﬁden0al
16

What
MapR
is
not

•  Volumes
!=
federa0on

•  MapR
supports
>
10,000
volumes
all
with

independent
placement
and
defaults

•  Volumes
support
snapshots
and
mirroring

•  NFS
!=
FUSE

•  Checksum
and
compress
at
gateway

•  IP
fail-‐over

•  Read/write/update
seman0cs
at
full
speed

•  MapR
!=
maprfs

10/11/11
©
MapR
Conﬁden0al
17

New
Capabili0es

10/11/11
©
MapR
Conﬁden0al
18

Alterna0ve
NFS
moun0ng
models

•  Export
to
the
world

•  NFS
gateway
runs
on
selected
gateway
hosts

•  Local
server

•  NFS
gateway
runs
on
local
host

•  Enables
local
compression
and
check
summing

•  Export
to
self

•  NFS
gateway
runs
on
all
data
nodes,
mounted

from
localhost

10/11/11
©
MapR
Conﬁden0al
19

Export
to
the
world

NFS

NFS

Server

NFS

Server

NFS

Server

NFS
Server

Client

10/11/11
©
MapR
Conﬁden0al
20

Local
server

Applica0on

NFS

Server

Client

Cluster
Nodes

10/11/11
©
MapR
Conﬁden0al
21

Universal
export
to
self

Cluster
Nodes

Task

NFS

Cluster
Server

Node

10/11/11
©
MapR
Conﬁden0al
22

Nodes
are
iden0cal

Task

Task

NFS

NFS

Cluster
Server

Node
Cluster
Server

Node

Task

NFS

Cluster
Server

Node

10/11/11
©
MapR
Conﬁden0al
23

Applica0on
architecture

•  High
performance
map-‐reduce
is
nice

•  But
algorithmic
ﬂexibility
is
even
nicer

10/11/11
©
MapR
Conﬁden0al
24

Sharded
text
Indexing

Assign
documents
Index
text
to
local
disk

to
shards
and
then
copy
index
to

distributed
ﬁle
store

Clustered

Reducer
index
storage

Input
Map

documents

Copy
to
local
disk

Local

required
before
Local

typically
disk
Search

index
can
be
loaded
disk
Engine

10/11/11
©
MapR
Conﬁden0al
25

Sharded
text
indexing

•  Mapper
assigns
document
to
shard

•  Shard
is
usually
hash
of
document
id

•  Reducer
indexes
all
documents
for
a
shard

•  Indexes
created
on
local
disk

•  On
success,
copy
index
to
DFS

•  On
failure,
delete
local
ﬁles

•  Must
avoid
directory
collisions

•  can’t
use
shard
id!

•  Must
manage
and
reclaim
local
disk
space

10/11/11
©
MapR
Conﬁden0al
26

Conven0onal
data
ﬂow

Failure
of
search

engine
requires

Failure
of
a
reducer
another
download

causes
garbage
to
of
the
index
from

accumulate
in
the
clustered
storage.

Clustered

local
disk
Reducer

index
storage

Input
Map

documents

Local

disk
Local
Search

disk
Engine

10/11/11
©
MapR
Conﬁden0al
27

Simplified
NFS
data
flows

Index
to
task
work

directory
via
NFS

Search

Engine

Reducer

Input
Map
Clustered

documents

index
storage

Failure
of
a
reducer
Search
engine

is
cleaned
up
by
reads
mirrored

map-‐reduce
index
directly.

framework

10/11/11
©
MapR
Confiden0al
28

Simplified
NFS
data
flows

Search

Mirroring
allows
Engine

exact
placement

of
index
data

Reducer

Input
Map

documents
Search

Engine

Aribitrary
levels

of
replica0on

also
possible
Mirrors

10/11/11
©
MapR
Confiden0al
29

How
about
another
one?

10/11/11
©
MapR
Conﬁden0al
30

K-‐means

•  Classic
E-‐M
based
algorithm

•  Given
cluster
centroids,

•  Assign
each
data
point
to
nearest
centroid

•  Accumulate
new
centroids

•  Rinse,
lather,
repeat

10/11/11
©
MapR
Conﬁden0al
31

K-‐means,
the
movie

Centroids

I

n
Assign
Aggregate

p
to
new

u
Nearest
centroids

t
centroid

10/11/11
©
MapR
Conﬁden0al
32

Parallel
Stochas0c
Gradient
Descent

Model

I

n

Train
Average

p

sub
models

u

model

t

10/11/11
©
MapR
Conﬁden0al
34

Varia0onal
Dirichlet
Assignment

Model

I

n

Gather
Update

p

suﬃcient
model

u

sta0s0cs

t

10/11/11
©
MapR
Conﬁden0al
35

Old
tricks,
new
dogs

Read
from
local
disk

•  Mapper
from
distributed
cache

•  Assign
point
to
cluster

Read
from

•  Emit
cluster
id,
(1,
point)
HDFS
to
local
disk

by
distributed
cache

•  Combiner
and
reducer

•  Sum
counts,
weighted
sum
of
points

•  Emit
cluster
id,
(n,
sum/n)
WriQen
by

map-‐reduce

•  Output
to
HDFS

10/11/11
©
MapR
Conﬁden0al
36

Old
tricks,
new
dogs

•  Mapper

Read

•  Assign
point
to
cluster
from

•  Emit
cluster
id,
(1,
point)
NFS

•  Combiner
and
reducer

•  Sum
counts,
weighted
sum
of
points

•  Emit
cluster
id,
(n,
sum/n)
WriQen
by

map-‐reduce

•  Output
to
HDFS

MapR
FS

10/11/11
©
MapR
Conﬁden0al
37

Poor
man’s
Pregel

•  Mapper

while not done:!
read and accumulate input models!
for each input:!
accumulate model!
write model!
synchronize!
reset input format!
emit summary!

•  Lines
in
bold
can
use
conven0onal
I/O
via
NFS

10/11/11
©
MapR
Conﬁden0al
38
38

Click
modeling
architecture

Side-‐data

Now
via
NFS

I

Feature

n
Sequen0al

extrac0on
Data

p
SGD

and
join

u
Learning

down

t

sampling

Map-‐reduce

10/11/11
©
MapR
Conﬁden0al
39

Click
modeling
architecture

Side-‐data

Map-‐reduce

cooperates
Sequen0al

with
NFS
SGD

Learning

Sequen0al

SGD

I
Learning

Feature

n
Sequen0al

extrac0on
Data

p
SGD

and
join

u
Learning

down

t

sampling
Sequen0al

SGD

Learning

Map-‐reduce
Map-‐reduce

10/11/11
©
MapR
Conﬁden0al
40

Hybrid
model
ﬂow

Feature
extrac0on

and

Down

down
sampling
stream

modeling

Map-‐reduce

Deployed

Map-‐reduce
Model

SVD

(PageRank)

(spectral)

??

10/11/11
©
MapR
Conﬁden0al
42

Hybrid
model
ﬂow

Feature
extrac0on

and

Down

down
sampling
stream

modeling

Deployed

Model

SVD

(PageRank)

(spectral)

Sequen0al

Map-‐reduce

10/11/11
©
MapR
Conﬁden0al
44

Trivial
visualiza0on
interface

output
is
visible
via
NFS

$ R!
> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)!
> plot(error ~ t, x)!
> q(save=‘n’)!

•  Legacy
visualiza0on
just
works

10/11/11
©
MapR
Conﬁden0al
46

Conclusions

•  We
used
to
know
all
this

•  Tab
comple0on
used
to
work

•  5
years
of
work-‐arounds
have
clouded
our

memories

•  We
just
have
to
remember
the
future

10/11/11
©
MapR
Conﬁden0al
47

Cmu-2011-09.pptx

More Related Content

Viewers also liked (8)

Similar to Cmu-2011-09.pptx (20)

More from Ted Dunning (20)

Recently uploaded (20)

Cmu-2011-09.pptx