Tupperware: Containerized Deployment at FB

Tupperware:
Containerized Deployment at FB
Aravind Narayanan
aravindn@fb.com
DockerCon 2014

Scale makes everything harder
• Running single instance: easy
• Running at scale in production: messy and
complicated
Provision machines
Distribute binaries
Daemonize processMonitoring
Failover Geo-distribution
Machine decoms

Time spent on
Application Logic
Time spent on getting app
to run in prod

Tupperware to the rescue
“This is my binary. Run it on X machines!”
• Engineer is hands-off
• Doesn’t need to worry about machines in
prod
• Handles failover, when machines go bad
• Efﬁcient use of infrastructure
• 300,000+ processes, spread over 15,000+
services

Agenda
1. Architecture
2. Sandboxes
3. Ecosystem
4. Lessons learnt

Facebook Datacenters
PRN
SNC FRC
ASH
LLA

Terminology
• A DC has one or more clusters
• A cluster has multiple racks
• A rack has multiple machines
!
• A TW job is equivalent to a service
• A job has multiple tasks, each an
instance of the service

Architecture
Scheduler
Host1 Host2 Host3
conﬁg.tw
twdeploy
Server DB
Start Task
BitTorrent-based binary
package store
QuoteService

Machine “failures” / hour
Failover

Failover
Scheduler
Host1 Host2 Host3
Server DB
Start Task
QuoteService QuoteService
Heartbeat
BitTorrent-based binary
package store

Painless Hardware maintenance
• Notify scheduler of impending operations
• Scheduler can preemptively move tasks
• Graceful migration for stateless services
• Stateful services may endure maintenance

Expressive allocation policies
Production Host
MyBigJob
QuoteService
QuoteAggregator
Production Host
Top of Rack Switch
NetworkHogJob
NetworkHogJob
Job M Job N
Job M Job N

TW Agent
Agent Helper
TW Agent process
Package Manager
Resource Manager
Task Manager API
Scheduler heartbeat
Agent Helper Task B
Agent Helper Task C
Task A
Production Host

Agent Helper process
Heartbeat with Agent,

prevents zombies
TW Agent process
Package Manager
Resource Manager
Task Manager API
Scheduler heartbeat Agent Helper Task A

Logging
Compress on the ﬂy
Log Files
• Persistent logs
• Instantaneous rotation
Task A
Agent Helper
Log Catcher
stderr stdout

Sandboxing
Initially, used chroots to contain processes
• No isolation
• Not secure

LinuXContainers
• As tech matured, we switched
• Separate process and ﬁle
namespaces, set up by Helper
• Mount required resources
directly into container
• Secure & isolated

Service permissions
• Every container runs sshd
• SSH directly into the container
• Regulate access
Production Host
TW Agent process
Task Manager API
Agent Helper Task A
SSH
X

Resource limits
• CPU, RAM & disk limits
• Implemented with cgroups
• Agent handles memory limits
with cgroup notiﬁcation API
• Adaptive limits

Resource Limits in action
watchdog-service - tw.mem.rss_bytes

Migrate from Chroots to
Containers
• No-op for most services
• But new namespaces posed problems
for some
• Major hurdle was social, not technical

Service Discovery
Scheduler
Host1 Host2 Host3
Server DB
Service Registry
QuoteService
QuoteService
Host1:12345 ALIVE
Host2:12345 ALIVE
QuoteService
Host2:12345 ALIVE
Host1:12345 DEAD
Start Task
QuoteService
QuoteService
Host2:12345 ALIVE
Host3:12345 ALIVE
QuoteService
Start Task

Alternatives to Tupperware
• Why not use Docker / CoreOS?
• They didn’t exist
• TW integrates with other FB systems
• Why not use VMs?
• Performance penalty
• Hypervisor makes debugging harder

Releases are scary!
• Release often
• Dry runs
• Canaries are your friends
• Manage dependencies

Sane defaults
• Users shouldn’t have to read entire manual
• Choose what makes sense for most services

What went wrong?
• Hard to understand why TW did
something
• It’s not about “what went wrong”,
but “what should I do next?”

Tupperware
• Automated deployment
• Less work for engineers
• Containers for security
and isolation
• Increased efﬁciency
Questions?
Time spent on getting app
to run in prod
Time spent on
Application Logic

Tupperware: Containerized Deployment at FB

More Related Content

What's hot(20)

Similar to Tupperware: Containerized Deployment at FB(20)

More from Docker, Inc.(20)

Tupperware: Containerized Deployment at FB