SlideShare a Scribd company logo
Distributed Erlang
Systems In Operation
 Andy Gross <andy@basho.com>, @argv0
             VP, Engineering
           Basho Technologies
         Erlang Factory SF 2010
Architectural Goals
• Decentralized (no masters).
• Distributed (asynchronous, nodes use only
  local data).
• Homogeneous (all nodes can do anything).
• Fault tolerant (emergent goal).
• Observable
Anti-Goals
• Global state:
 • pg2/hot data in mnesia
 • globally registered names
• Distributed transactions
• Reliance on physical time
Compromise your
       Goals
• Decentralized (no masters).
• Distributed (nodes use only local data).
• Homogeneous (all nodes can do anything).
• No distributed transactions/global state.
• No reliance on physical time.
Systems Design

• Cluster Membership
• Load balancing/naming/resource allocation
• Liveness checking
• Soft Global State
Cluster Membership
• Option 1: Use a configuration file:
 • Requires out-of-band sync of
    configuration file across machines.
  • Not “elastic” enough for some use-cases.
• Option II: Contact a seed node to join and
  use gossip protocol to propagate state.
Load Balancing and
  Resource Allocation
• Static assignment
• Round-robin/Random
• Static hashing: Nodes[hash(Item) mod
  length(Nodes)]
• Dynamo/Riak/Cassandra/Voldemort:
  Consistent Hashing
Liveness Checking
• nodes() and net_adm:ping() operations can
  be too low-level.
• Sometimes you’d like to divert traffic from
  a node at the application level while
  keeping distributed Erlang up.
• Use net_kernel:monitor_nodes() and an
  app-level mechanism for liveness.
Soft State/Gossip
         Protocols
• An eventually-consistent alternative to
  global state.
• Nodes make changes, gossip to another
  node.
• Nodes receive changes, merge with local
  state, gossip to another node.
• Requires up-front thought about data
  structures, dealing with slightly-stale data.
Running Your System

• Shipping code
• Upgrading code
• Debugging your own systems
• Living with other people’s systems
Shipping Code

• Don’t rely on working Erlang on end-user
  machines (many Linux distros are broken
  or out of date).
• Ship code with an embedded runtime and
  libraries.
• Put version/build info in code.
Upgrading Code
• Hot code loading for small, emergency
  fixes.
• For new releases, reboot the node.
• Why not .appups?
 • Systems I’ve worked on have changed/
    evolved too fast.
 • A reboot is a good test of resiliency.
Debugging Running
       Systems
• Remote Erlang shells are awesome, except
  when distributed Erlang dies (it happens).
• run_erl (or even screen(1)) give you a
  backdoor for when -remsh fails.
• rebar (http://hg.basho.com/rebar) makes
  this easy.
• What if you don’t have access to the box?
OPS - Other People’s
         Systems
• Your Erlang, Enterprise firewalls.
• Erlang shell is powerful, but scary.
• Provide a debugging module.
• Get data out via HTTP/SMTP/SNMP
• Use disk_log/report_browser.
Questions?
“You know you have [a distributed system]
when the crash of a computer you’ve never
 heard of stops you from getting any work
                  done”

             -Leslie Lamport
Resources
•   unsplit: http://github.com/uwiger/unsplit

•   gen_leader: http://github.com/KirinDave/gen_leader_revival

•   Dynamo: http://www.allthingsdistributed.com/2007/10/
    amazons_dynamo.html

•   Hans Svensson: Distributed Erlang Application Pitfalls and
    Recipes: http://www.erlang.org/workshop/2007/proceedings/
    06svenss.ppt

•   Consistent Hashing and Random Trees: Distributed Caching
    Protocols for relieving Hot Spots on the World Wide Web:
    http://bit.ly/LewinConsistentHashing

More Related Content

What's hot (20)

PDF
A Collaborative Data Science Development Workflow
Databricks
 
PDF
Whirlpools in the Stream with Jayesh Lalwani
Databricks
 
PPTX
ELK at LinkedIn - Kafka, scaling, lessons learned
Tin Le
 
PDF
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
HostedbyConfluent
 
PDF
Creating Reusable Geospatial Pipelines
Databricks
 
PDF
Streaming SQL
Jungtaek Lim
 
PDF
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Databricks
 
PPTX
Gocd – Kubernetes/Nomad Continuous Deployment
Leandro Totino Pereira
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
Cooperative Data Exploration with iPython Notebook
DataWorks Summit/Hadoop Summit
 
PDF
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
NETWAYS
 
PDF
Migrating to Apache Spark at Netflix
Databricks
 
PPTX
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxData
 
PPTX
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Brandon O'Brien
 
PDF
Standalone Spark Deployment for Stability and Performance
Romi Kuntsman
 
PDF
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
PPTX
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
PDF
#MesosCon 2014: Spark on Mesos
Paco Nathan
 
PDF
Real-time Data Streaming from Oracle to Apache Kafka
confluent
 
PPTX
Introducing Kubernetes
VikRam S
 
A Collaborative Data Science Development Workflow
Databricks
 
Whirlpools in the Stream with Jayesh Lalwani
Databricks
 
ELK at LinkedIn - Kafka, scaling, lessons learned
Tin Le
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
HostedbyConfluent
 
Creating Reusable Geospatial Pipelines
Databricks
 
Streaming SQL
Jungtaek Lim
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Databricks
 
Gocd – Kubernetes/Nomad Continuous Deployment
Leandro Totino Pereira
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Cooperative Data Exploration with iPython Notebook
DataWorks Summit/Hadoop Summit
 
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
NETWAYS
 
Migrating to Apache Spark at Netflix
Databricks
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxData
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Brandon O'Brien
 
Standalone Spark Deployment for Stability and Performance
Romi Kuntsman
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
#MesosCon 2014: Spark on Mesos
Paco Nathan
 
Real-time Data Streaming from Oracle to Apache Kafka
confluent
 
Introducing Kubernetes
VikRam S
 

Viewers also liked (12)

PDF
東京Node学園#8 Let It Crash!?
koichik
 
PDF
Concurrency in Elixir with OTP
Justin Reese
 
KEY
Erlang vs. Java
Artan Cami
 
KEY
Intro to Erlang
Ken Pratt
 
PDF
1 hour dive into Erlang/OTP
Jordi Llonch
 
PDF
Node.jsエンジニア Erlangに入門するの巻
Recruit Technologies
 
PPT
Erlang OTP
Zvi Avraham
 
PDF
Intro To Erlang
asceth
 
PPTX
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
Yugo Shimizu
 
PPTX
Imprementation of realtime_networkgame
Satoshi Yamafuji
 
PDF
ニコニコ生放送の配信基盤改善
takahiro_yachi
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
東京Node学園#8 Let It Crash!?
koichik
 
Concurrency in Elixir with OTP
Justin Reese
 
Erlang vs. Java
Artan Cami
 
Intro to Erlang
Ken Pratt
 
1 hour dive into Erlang/OTP
Jordi Llonch
 
Node.jsエンジニア Erlangに入門するの巻
Recruit Technologies
 
Erlang OTP
Zvi Avraham
 
Intro To Erlang
asceth
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
Yugo Shimizu
 
Imprementation of realtime_networkgame
Satoshi Yamafuji
 
ニコニコ生放送の配信基盤改善
takahiro_yachi
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
Ad

Similar to Distributed Erlang Systems In Operation (20)

PDF
Let it crash! The Erlang Approach to Building Reliable Services
Brian Troutwine
 
PDF
Introduction to Erlang Part 2
Dmitry Zinoviev
 
PDF
Erlang, an overview
Patrick Huesler
 
PDF
Erlang factory SF 2011 "Erlang and the big switch in social games"
Paolo Negri
 
PDF
Erlang, the big switch in social games
Wooga
 
PDF
Erlang: Bult for concurrent, distributed systems
Ken Pratt
 
PDF
Distributed Elixir
Óscar De Arriba González
 
PDF
You shall not get excited
x697272
 
ODP
An introduction to erlang
Mirko Bonadei
 
PPS
Erlang plus BDB: Disrupting the Conventional Web Wisdom
guest3933de
 
PPS
Disrupt
guest6b7220
 
PDF
Erlang is not a city in Germany
momo-13
 
KEY
Erlang bootstrap course
Martin Logan
 
PDF
Fi fo euc 2014
Licenser
 
PDF
Erlang Message Passing Concurrency, For The Win
l xf
 
PDF
Erlang - Concurrent Language for Concurrent World
Zvi Avraham
 
PPTX
Erlang - Because S**t Happens
Mahesh Paolini-Subramanya
 
PPTX
Erlang kickstart
Ryan Brown
 
PDF
Erlang Lightning Talk
GiltTech
 
KEY
Osdc 2011 michael_neale
Michael Neale
 
Let it crash! The Erlang Approach to Building Reliable Services
Brian Troutwine
 
Introduction to Erlang Part 2
Dmitry Zinoviev
 
Erlang, an overview
Patrick Huesler
 
Erlang factory SF 2011 "Erlang and the big switch in social games"
Paolo Negri
 
Erlang, the big switch in social games
Wooga
 
Erlang: Bult for concurrent, distributed systems
Ken Pratt
 
Distributed Elixir
Óscar De Arriba González
 
You shall not get excited
x697272
 
An introduction to erlang
Mirko Bonadei
 
Erlang plus BDB: Disrupting the Conventional Web Wisdom
guest3933de
 
Disrupt
guest6b7220
 
Erlang is not a city in Germany
momo-13
 
Erlang bootstrap course
Martin Logan
 
Fi fo euc 2014
Licenser
 
Erlang Message Passing Concurrency, For The Win
l xf
 
Erlang - Concurrent Language for Concurrent World
Zvi Avraham
 
Erlang - Because S**t Happens
Mahesh Paolini-Subramanya
 
Erlang kickstart
Ryan Brown
 
Erlang Lightning Talk
GiltTech
 
Osdc 2011 michael_neale
Michael Neale
 
Ad

Distributed Erlang Systems In Operation

  • 1. Distributed Erlang Systems In Operation Andy Gross <[email protected]>, @argv0 VP, Engineering Basho Technologies Erlang Factory SF 2010
  • 2. Architectural Goals • Decentralized (no masters). • Distributed (asynchronous, nodes use only local data). • Homogeneous (all nodes can do anything). • Fault tolerant (emergent goal). • Observable
  • 3. Anti-Goals • Global state: • pg2/hot data in mnesia • globally registered names • Distributed transactions • Reliance on physical time
  • 4. Compromise your Goals • Decentralized (no masters). • Distributed (nodes use only local data). • Homogeneous (all nodes can do anything). • No distributed transactions/global state. • No reliance on physical time.
  • 5. Systems Design • Cluster Membership • Load balancing/naming/resource allocation • Liveness checking • Soft Global State
  • 6. Cluster Membership • Option 1: Use a configuration file: • Requires out-of-band sync of configuration file across machines. • Not “elastic” enough for some use-cases. • Option II: Contact a seed node to join and use gossip protocol to propagate state.
  • 7. Load Balancing and Resource Allocation • Static assignment • Round-robin/Random • Static hashing: Nodes[hash(Item) mod length(Nodes)] • Dynamo/Riak/Cassandra/Voldemort: Consistent Hashing
  • 8. Liveness Checking • nodes() and net_adm:ping() operations can be too low-level. • Sometimes you’d like to divert traffic from a node at the application level while keeping distributed Erlang up. • Use net_kernel:monitor_nodes() and an app-level mechanism for liveness.
  • 9. Soft State/Gossip Protocols • An eventually-consistent alternative to global state. • Nodes make changes, gossip to another node. • Nodes receive changes, merge with local state, gossip to another node. • Requires up-front thought about data structures, dealing with slightly-stale data.
  • 10. Running Your System • Shipping code • Upgrading code • Debugging your own systems • Living with other people’s systems
  • 11. Shipping Code • Don’t rely on working Erlang on end-user machines (many Linux distros are broken or out of date). • Ship code with an embedded runtime and libraries. • Put version/build info in code.
  • 12. Upgrading Code • Hot code loading for small, emergency fixes. • For new releases, reboot the node. • Why not .appups? • Systems I’ve worked on have changed/ evolved too fast. • A reboot is a good test of resiliency.
  • 13. Debugging Running Systems • Remote Erlang shells are awesome, except when distributed Erlang dies (it happens). • run_erl (or even screen(1)) give you a backdoor for when -remsh fails. • rebar (http://hg.basho.com/rebar) makes this easy. • What if you don’t have access to the box?
  • 14. OPS - Other People’s Systems • Your Erlang, Enterprise firewalls. • Erlang shell is powerful, but scary. • Provide a debugging module. • Get data out via HTTP/SMTP/SNMP • Use disk_log/report_browser.
  • 15. Questions? “You know you have [a distributed system] when the crash of a computer you’ve never heard of stops you from getting any work done” -Leslie Lamport
  • 16. Resources • unsplit: http://github.com/uwiger/unsplit • gen_leader: http://github.com/KirinDave/gen_leader_revival • Dynamo: http://www.allthingsdistributed.com/2007/10/ amazons_dynamo.html • Hans Svensson: Distributed Erlang Application Pitfalls and Recipes: http://www.erlang.org/workshop/2007/proceedings/ 06svenss.ppt • Consistent Hashing and Random Trees: Distributed Caching Protocols for relieving Hot Spots on the World Wide Web: http://bit.ly/LewinConsistentHashing