InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
pinterest-resilient-systems
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Highly Resilient Systems at Pinterest
Yongsheng Wu

Engineering Manager of Storage & Caching, Pinterest

Email: yongsheng@pinterest.com

Pinterest: www.pinterest.com/yswu

Nov 17, 2015
Our mission is to help people
discover and do what they love
50+ billion Pins
categorized by 100+ million Pinners into
> 1 billion Boards
10s of 1,000s of
Instances on AWS
~100 Services
Some Systems
Handles ~1M QPS
Failures Happen All the
Time
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3
Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3
Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3
Client Retries
Clients
foo
1
bar
1
bar
2
bar
3
bar 1, 2, 3
bar
4
?
Reactive Manifesto
Zookeeper
• Highly reliable distributed coordination
• Hierarchical namespace: ZNode
• Persistent Node
• Ephemeral Node
• Sequence Node
• Watch
• Node Children Changed
• Node Created
• Node Data Changed
• Node Deleted
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1
Capacity Addition
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1
Capacity Addition
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar2
Capacity Addition
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Capacity Reduction
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 1, bar 2
Capacity Reduction
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 1
Capacity Reduction
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Zookeeper Failure
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
Zookeeper with Observers
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2
bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
Zookeeper with Observers Failure
Avoid Complete reliance on any
single system, even if it is a highly
reliable distributed system
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2 bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
ZUM bar
bar 1, bar 2
Zookeeper with Observers Protected by Local Persistent ServerSet
Dynamic Service Discovery
bar
1
Zookeeper
/discovery
bar
prod
bar 1
foo
foo apps
bar
2
bar 2 bar 1, bar 2
Observers
/discovery
bar
prod
bar 1
bar 2
ZUM bar
bar 1, bar 2
Zookeeper with Observers Protected by Local Persistent ServerSet
Challenges
• Rapid planned capacity reduction
• Gradual loss of instances
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Typical Service Setup
Config
ServerClient
Cache
DB
Realtime Configuration
• Decider
• Experiment
• Rate limiting
• Failover
• … …
Realtime Configuration
Zookeeper
/config
decider
setting_1: 53
foo
foo apps
{“setting_1”: 53,
… …. }
setting_2: 0
setting_3: 100
setting_4: 78
… …
Config
Admin
Console
Zookeeper Powered Realtime Configuration Management
Realtime Configuration
foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v0
AWS S3
Config
Admin
Console
/config
decider: v0
ZUM
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
ZUM decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v0
AWS S3
Config
Admin
Console
/config
decider: v0
v1: {“setting_1”: 54,
… …. }
v1
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
ZUM decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v0
v1: {“setting_1”: 54,
… …. }
v1
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v0: {“setting_1”: 53,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v0: {“setting_1”: 53,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v1: {“setting_1”: 54,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
v1: {“setting_1”: 54,
… …. }
v1
ZUM
v1: {“setting_1”: 54,
… …. }
Zookeeper, Observers, S3 Based Realtime Configuration Management
Realtime Configuration
foo
foo apps
v1: {“setting_1”: 54,
… …. }
Observers
decider
v1: {“setting_1”: 54,
… …. }
Zookeeper
/config
decider: v1
AWS S3
Config
Admin
Console
/config
decider: v1
ZUM
Zookeeper, Observers, S3 Based Realtime Configuration Management
Challenges
• Staggered rollout
• Configuration of huge size
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Typical Service Setup
Config
ServerClient
Cache
DB
Caching
bar
cache
1
bar
cache
m
proc1
… …
prock
consistent hash
ring
Foo 1
… …
Caching
bar
cache
1
bar
cache
m
proc1
… …
prock
Foo 2
… …
proc1
… … prock
Foo 1
… …
proc1
… …
prock
Foo n
Too many
connections!
Twemproxy
bar
cache
1
bar
cache
m
… …
proc1
… …
prock
Foo 1
… …
proc1
… …
prock
Foo 2
proc1
… …
prock
Foo n
NutcrackerNutcracker Nutcracker
Cache Inconsistency
Nutcracker
… …foo: 1
bar cache 1 bar cache 2 bar cache n
set foo 2
foo: 2
Cache Inconsistency
Nutcracker
… …foo: 1
bar cache 1 bar cache 2 bar cache n
foo: 2
McRouter
… …
cache 1 cache 2 cache n
McRouter
McRouter
… …
cache 1 cache 2 cache n
McRouter
No ring reshuffle
McRouter
Pros
• No inconsistency caused by node joining/leaving the pool
• No cascading failures in case of excessive load caused by hot keys
Cons
• Cache misses
Replicated Pools - Reads
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Poolgetfoo
Replicated Pools - Reads
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Poolgetfoo
Replicated Pools - Invalidation
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Pooldeletefoo
Replicated Pools - Invalidation
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Pooldeletefoo
Log
Replicated Pools - Invalidation
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Pooldeletefoo
Log
Replicated Pools - Invalidation
McRouter
cache
1
cache
2
cache
n…
cache
1’
cache
2’
cache
n’…
ac Pool de Pooldeletefoo
LogSinger
kafka
tailer
PinLater
Challenges
• Build the feedback loop from persistent layer to
caching layer
• Move to multiple geographic regions
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Typical Service Setup
Config
ServerClient
Cache
DB
Sharding
Shard Id Type Local Id
64 bits
Clients
DataServices
1
… …
Master
1
Slave
1 … …
Master
m
Slave
m
Clients
Clients
DataServices
2
DataServices
n
Read from Slave
Clients
DataServices
1
… …
Master
1
Slave
1 … …
Master
m
Slave
m
Clients
Clients
Read from Slave
DataServices
2
DataServices
n
Read from slave after
master failing health
check over a certain
period of time.
Master
1
Slave
1 … …
Master
m
Slave
m
Failover
DataServices Proc
ZUM ShardConfig
ShardConfig
Changes
Realtime Configuration Management Powered Failover
Other Persistence Stores
UMetaStore
• Key value store based on HBase
Zen
• Graph store: nodes and edges
• Flexible schema
• Custom index
• Both HBase and MySQL
Future
• Rocksdb
Challenges
• Automated failover with MySQL replica set
• HBase 1.x Upgrade
• Move to multiple geographic regions
Replication and failover are the
key ingredients for building highly
resilient storage and caching
systems
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Async Processing
Use Cases
• Acknowledge success with non-time-sensitive actions taken at
later time
• Schedule and execute large number of jobs
Benefits
• Faster response time
• More resilient to dependent system failures
Pyres Limitations
• No mechanism for success acknowledgement
• No visibility into status of individual job types
• No support for scheduled job execution at a
specific time in the future
• Rate limiting and retries are hard to manage
• Redis as only supported storage backend
PinLater
Asynchronous Processing System
Clients
Pinlater
Servers
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pools
Master Slave
Clients
Clients
Pinlater
Servers
PinLater
Servers
Storage Backend
Enqueue
Dequeue/ACK
PinLater Job State Transition
Pending Running
Succeeded
Failed
Done
Failed (no more retries)
Dequeued
Failed (retries left)/Claim Timeout
PinLater Job Requirements
Idempotency
Commutativity
PinLater Dashboard
Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System
Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System
Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System
Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System
Clients
PinLater
Server
1
PinLater
Server
2
PinLater
Server
n
… …
Master
1
Slave
1
… …
Master
m
Slave
m
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
1
… …
Infra
Worker
Pool
Infra
Worker
Pool
Worker
Pool
k
Clients
Clients
Enqueue
PinLater
Asynchronous Processing System
Challenges
• Multi-tenancy with fair failure isolation
• Fault-tolerant async job enqueuer
Use async processing as much as
possible to deliver faster
response time and make request
handling more robust
Dynamic Service Discovery
Realtime Configuration
Caching
Persistent Storage
Async Processing
Learnings
• Avoid Complete reliance on any single system, even
if it is a highly reliable distributed system
• Replication and failover are the key ingredients for
building highly resilient storage and caching
systems
• Use async processing as much as possible to
deliver faster response time and make request
handling more robust
Failure Testing
• Be explicit with scope
• Failure Modes
• Sandbox testing
• Manual testing
• Automated simulation
• Testing on production
• AWS is doing it for us all the time
• Simian Army
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/pinterest-
resilient-systems

Building Highly-resilient Systems at Pinterest