SlideShare a Scribd company logo
{
Erlang :
Because S**t happens
Mahesh Paolini-Subramanya (@dieswaytoofast)
V.P. Ubiquiti Networks
AGILITY
My Vacation
(Actually, the day before)
A small failure…
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
The Horror! The Horror!
Why are my calls failing?
You better call me back!
I’m still p***ed off!
And you’re stupid Apps
don’t work!
The Horror! The Horror!
Surely you Tested?
1000 year floods
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Fault Tolerance
 Concurrency
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
 Concurrency
 Fault detection
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
 Concurrency
 Fault detection
 Fault identification
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
erlang…
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
Concurrency Oriented
Concurrency Hell
My Blue Heaven My Blue Heaven
Concurrency Oriented
Concurrency Hell
My Blue Heaven
Deep Problems
My Blue Heaven
Deep Problems
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
Fault Detection
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
Stack Traces?
Immutable Variables
 X = 1.
Immutable Variables
 X = 1.
 X = 2.
Huh?
Immutable Variables
 X = 1.
 X = 2.
 X = X + 1.
Huh?
Fault Identification
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
Let It Crash
BEAM!
 Faster to create
JVM is not necessarily
your friend!
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
Code Upgrade
 Live!
Hot SwappingCode Upgrade
 Concurrency
 Fault detection
 Fault Identification
 Error Encapsulation
 Code upgrade
 Stable Storage
The Big Six
From http://www.erlang.org/download/armstrong_thesis_2003.pdf
The Intangibles
4x – 10x less code
Code Size
 Faster to create
4x – 10x less code
 Faster to create
 Easier to reason about
4x – 10x less code
 Faster to create
 Easier to reason about
 Fewer bugs
4x – 10x less code
 Faster to create
 Easier to reason about
 Fewer bugs
 Speedy refactoring
4x – 10x less code
The Shell is our friend
Live Debugging
Predictability
Performance
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Fault Tolerance - Systems
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
Romney 2012
Fault Tolerance - Systems
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
LOOSECOUPLING
Loose Coupling?
 Breeds Trust
Loose Coupling
Loose Coupling
 Breeds Trust
 Devote more brainpower to specific areas
Loose Coupling
Loose Coupling
 Breeds Trust
 Devote more brainpower to specific areas
 No. of bugs/line is constant
Loose Coupling
Performance
 60 - 90% of all SW projects fail
 10 – 25% of all SW projects get abandoned
Fault Tolerance
Erlang - Because s**t Happens by Mahesh Paolini-Subramanya
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
M
ONITORING
Monitoring?
 Dashboards
Monitoring?
 Dashboards
 Out of band systems
Monitoring?
Supervision
 Dashboards
 Out of band systems
 Polyglot safety
Monitoring?
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
POLYGLOT
PERSISTENCE
 Concurrency
 Error encapsulation
 Fault detection
 Fault identification
 Code upgrade
 Stable Storage
The Big Six - Systems
EVERYW
HERE!!!
No battle plan survives
contact with the enemy
 Not just about Systems 
Fault Tolerance
Fault Tolerance
 People
 Vendors
Fault Tolerance
 People
 Vendors
 Fraud
Fault Tolerance
The BusinessBeware the Black Swan
Is It Safe?
erlang…
mahesh@dieswaytoofast.com
@dieswaytoofastQuestions
Coda
Active Queue
Management
Queues
Queues
Queues
Queues
 Can you recover quickly?
 Buffer-bloat doesn’t matter, right?
 Once up, can you deal with the backlog?
 Back-pressure isn’t an issue, right?
Queues
 Can you recover quickly?
 Buffer-bloat doesn’t matter, right?
 Once up, can you deal with the backlog?
 Back-pressure isn’t an issue, right?
Queues
NOPE
Programmable
Behavioral
Self Managed
Something’s gotta give
Tail Drop
God
(category – TCP/IP)
RED
RED
Newark Airport
FRED
RED-PD
WRED
RED – Many many more
 SRED
 RRED
 ARED (and Blue!)
 CHOKe
Special Mention
 RED in a different Light
SERIOUSLY!
 RED in a different Light
 CoDel and fq_codel
mahesh@dieswaytoofast.com
@dieswaytoofastQuestions

More Related Content

PDF
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Downtown San J...
Baruch Sadogursky
 
PDF
Applying principles of chaos engineering to serverless (O'Reilly Software Arc...
Yan Cui
 
PDF
Make It Fixable, Living with Risk (NDC London 2018)
Patricia Aas
 
PDF
Unsafe Java World - Crossing the Borderline - JokerConf 2014 Saint Petersburg
Christoph Engelbert
 
PDF
Whiskey, Tango, Foxtrot: Understanding API Usage
Clay Loveless
 
PDF
Applying principles of chaos engineering to serverless (CodeMesh)
Yan Cui
 
PDF
Debugging Your Plone Site
cdw9
 
PPTX
Secure Your Pipeline
Soluto
 
DevOps @Scale (Greek Tragedy in 3 Acts) as it was presented at Downtown San J...
Baruch Sadogursky
 
Applying principles of chaos engineering to serverless (O'Reilly Software Arc...
Yan Cui
 
Make It Fixable, Living with Risk (NDC London 2018)
Patricia Aas
 
Unsafe Java World - Crossing the Borderline - JokerConf 2014 Saint Petersburg
Christoph Engelbert
 
Whiskey, Tango, Foxtrot: Understanding API Usage
Clay Loveless
 
Applying principles of chaos engineering to serverless (CodeMesh)
Yan Cui
 
Debugging Your Plone Site
cdw9
 
Secure Your Pipeline
Soluto
 

What's hot (6)

PDF
A Post-Apocalyptic sun.misc.Unsafe World
Christoph Engelbert
 
PDF
Your Goat Anti-Fragiled My SnowFlake!
Clinton Wolfe
 
PDF
Failure the-good-parts
legendofklang
 
PDF
Make it Fixable (NDC Copenhagen 2018)
Patricia Aas
 
PDF
Attacking open source using abandoned resources
Adam Baldwin
 
PDF
Railsonfire @ cloudcamp.sk
Codeship
 
A Post-Apocalyptic sun.misc.Unsafe World
Christoph Engelbert
 
Your Goat Anti-Fragiled My SnowFlake!
Clinton Wolfe
 
Failure the-good-parts
legendofklang
 
Make it Fixable (NDC Copenhagen 2018)
Patricia Aas
 
Attacking open source using abandoned resources
Adam Baldwin
 
Railsonfire @ cloudcamp.sk
Codeship
 
Ad

Viewers also liked (20)

PDF
Messaging With Erlang And Jabber
l xf
 
KEY
Winning the Erlang Edit•Build•Test Cycle
Rusty Klophaus
 
PDF
20 reasons why we don't need architects (@pavlobaron)
Pavlo Baron
 
PDF
Clojure class
Aysylu Greenberg
 
PDF
What can be done with Java, but should better be done with Erlang (@pavlobaron)
Pavlo Baron
 
PDF
Elixir talk
Cory Gwin
 
PDF
High Performance Erlang
PerconaPerformance
 
PDF
Clojure values
Christophe Grand
 
PDF
Clojure made-simple - John Stevenson
JAX London
 
PDF
NDC London 2014: Erlang Patterns Matching Business Needs
Torben Hoffmann
 
ODP
From Perl To Elixir
Ruben Amortegui
 
PDF
VoltDB and Erlang - Tech planet 2012
Eonblast
 
ZIP
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
Howard Lewis Ship
 
PDF
Elixir for aspiring Erlang developers
Torben Dohrn
 
KEY
Clojure Intro
thnetos
 
PDF
Introduction to Erlang for Python Programmers
Python Ireland
 
PPTX
Erlang - Because S**t Happens
Mahesh Paolini-Subramanya
 
PDF
Clojure, Plain and Simple
Ben Mabey
 
PDF
Clojure: Towards The Essence of Programming
Howard Lewis Ship
 
PDF
Elixir Into Production
Jamie Winsor
 
Messaging With Erlang And Jabber
l xf
 
Winning the Erlang Edit•Build•Test Cycle
Rusty Klophaus
 
20 reasons why we don't need architects (@pavlobaron)
Pavlo Baron
 
Clojure class
Aysylu Greenberg
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
Pavlo Baron
 
Elixir talk
Cory Gwin
 
High Performance Erlang
PerconaPerformance
 
Clojure values
Christophe Grand
 
Clojure made-simple - John Stevenson
JAX London
 
NDC London 2014: Erlang Patterns Matching Business Needs
Torben Hoffmann
 
From Perl To Elixir
Ruben Amortegui
 
VoltDB and Erlang - Tech planet 2012
Eonblast
 
Clojure: Functional Concurrency for the JVM (presented at Open Source Bridge)
Howard Lewis Ship
 
Elixir for aspiring Erlang developers
Torben Dohrn
 
Clojure Intro
thnetos
 
Introduction to Erlang for Python Programmers
Python Ireland
 
Erlang - Because S**t Happens
Mahesh Paolini-Subramanya
 
Clojure, Plain and Simple
Ben Mabey
 
Clojure: Towards The Essence of Programming
Howard Lewis Ship
 
Elixir Into Production
Jamie Winsor
 
Ad

Similar to Erlang - Because s**t Happens by Mahesh Paolini-Subramanya (20)

PDF
Building private-clouds-qconsf
Andrew Shafer
 
ODP
testing for people who hate testing
Bram Vogelaar
 
PDF
Os Nightingale
oscon2007
 
PDF
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
PDF
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Burr Sutter
 
PDF
Securing Rails
Alex Payne
 
PDF
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Daniel Oh
 
PDF
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Burr Sutter
 
PDF
Secure Software Ecosystem Teqnation 2024
Soroosh Khodami
 
PDF
WebAssembly & Zero Trust for Code
All Things Open
 
ZIP
Macintosh Myths
jaberg
 
PDF
Beyond The Padlock: New Ideas in Browser Security UI
mozilla.presentations
 
PPTX
Anatomy of Java Vulnerabilities - NLJug 2018
Steve Poole
 
PDF
Red teaming the CCDC
scriptjunkie
 
PDF
mri-bp2015
Keith Swenson
 
PDF
44CON 2014 - Switches Get Stitches, Eireann Leverett & Matt Erasmus
44CON
 
PDF
Unmasking or De-Anonymizing You
E Hacking
 
PDF
Stability anti patterns in cloud-native applications
Ana-Maria Mihalceanu
 
PPTX
Malware Most Wanted: Evil Bunny
Cyphort
 
PDF
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Burr Sutter
 
Building private-clouds-qconsf
Andrew Shafer
 
testing for people who hate testing
Bram Vogelaar
 
Os Nightingale
oscon2007
 
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Burr Sutter
 
Securing Rails
Alex Payne
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Daniel Oh
 
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Burr Sutter
 
Secure Software Ecosystem Teqnation 2024
Soroosh Khodami
 
WebAssembly & Zero Trust for Code
All Things Open
 
Macintosh Myths
jaberg
 
Beyond The Padlock: New Ideas in Browser Security UI
mozilla.presentations
 
Anatomy of Java Vulnerabilities - NLJug 2018
Steve Poole
 
Red teaming the CCDC
scriptjunkie
 
mri-bp2015
Keith Swenson
 
44CON 2014 - Switches Get Stitches, Eireann Leverett & Matt Erasmus
44CON
 
Unmasking or De-Anonymizing You
E Hacking
 
Stability anti patterns in cloud-native applications
Ana-Maria Mihalceanu
 
Malware Most Wanted: Evil Bunny
Cyphort
 
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Burr Sutter
 

More from Hakka Labs (20)

PDF
Always Valid Inference (Ramesh Johari, Stanford)
Hakka Labs
 
PPTX
DataEngConf SF16 - High cardinality time series search
Hakka Labs
 
PDF
DataEngConf SF16 - Data Asserts: Defensive Data Science
Hakka Labs
 
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
PDF
DataEngConf SF16 - Recommendations at Instacart
Hakka Labs
 
PDF
DataEngConf SF16 - Running simulations at scale
Hakka Labs
 
PDF
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
Hakka Labs
 
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
PDF
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
Hakka Labs
 
PDF
DataEngConf SF16 - Three lessons learned from building a production machine l...
Hakka Labs
 
PDF
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
PDF
DataEngConf SF16 - Bridging the gap between data science and data engineering
Hakka Labs
 
PDF
DataEngConf SF16 - Multi-temporal Data Structures
Hakka Labs
 
PDF
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
Hakka Labs
 
PDF
DataEngConf SF16 - Beginning with Ourselves
Hakka Labs
 
PDF
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
Hakka Labs
 
PDF
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
Hakka Labs
 
PDF
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Hakka Labs
 
PDF
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
 
Always Valid Inference (Ramesh Johari, Stanford)
Hakka Labs
 
DataEngConf SF16 - High cardinality time series search
Hakka Labs
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
Hakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
Hakka Labs
 
DataEngConf SF16 - Running simulations at scale
Hakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
Hakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
Hakka Labs
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
Hakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
Hakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
Hakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
Hakka Labs
 
DataEngConf SF16 - Beginning with Ourselves
Hakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
Hakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
Hakka Labs
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Hakka Labs
 
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
 

Recently uploaded (20)

PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
The Future of Artificial Intelligence (AI)
Mukul
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 

Erlang - Because s**t Happens by Mahesh Paolini-Subramanya

Editor's Notes

  • #2: An overall approach to Preparedness
  • #3: This is a story about unexpectedness.The only constant is change
  • #4: Our story starts on a happy Saturday in february
  • #5: Its still Friday
  • #6: Just part of one cluster failed, but a threshold had been passed
  • #7: No worries, we’ll just bounce that one cluster, it’ll all be good
  • #8: Total System Meltdown
  • #9: All the calls keep retrying, causing memory utilization to go through the roof
  • #10: Voicemail conversion was going on independent of everything else, causing CPU utilization to spike
  • #11: Eventually, the cache timed out, and tried to reload stuff from the disk.
  • #12: And then everyone tries the Apps, and the Twitters and the facebooks and the everythings.
  • #13: Total System Meltdown
  • #14: What about testing? Didn’t you check loads? Specs? Capabilities?
  • #15: There is only so much planning you can do. At some point, the 1000 year flood hits
  • #16: The point being, Shit will happen.The question is, when Shit happens, can you clean up?
  • #17: There is a formal definition of Fault Tolerance
  • #18: The Six Essential Characteristics of a Fault Tolerant System
  • #19: The Six Essential Characteristics of a Fault Tolerant System
  • #20: The Six Essential Characteristics of a Fault Tolerant System
  • #21: The Six Essential Characteristics of a Fault Tolerant System
  • #22: The Six Essential Characteristics of a Fault Tolerant System
  • #23: The Six Essential Characteristics of a Fault Tolerant System
  • #24: The Six Essential Characteristics of a Fault Tolerant System
  • #25: The Six Essential Characteristics of a Fault Tolerant System
  • #26: ‘Distributed’ problems mean you spend a huge chunk of your time dealing with theadminstrivia of distribution.With erlang you get that for free!Processes, Messages, Immutability, “Writing Concurrent Programs in Java”
  • #27: Ok, not really true. You still have to deal with ‘deep problems’ (hard core parallelization issues, etc.)But you’d have to deal with that anyhow!
  • #28: The Six Essential Characteristics of a Fault Tolerant System
  • #29: The Six Essential Characteristics of a Fault Tolerant System
  • #30: The Six Essential Characteristics of a Fault Tolerant System
  • #31: Testing is infinitely easier. Trivial to simulate (its all messages!)Thank you immutability!
  • #34: Garbage Collection, Referential Integrity, Testing!!!
  • #35: The Six Essential Characteristics of a Fault Tolerant System
  • #36: The Six Essential Characteristics of a Fault Tolerant System
  • #37: Let it Crash
  • #38: BEAM --> insanely reliable. will last till the heat death of the universe if you leave it alone
  • #39: JVM is not necessarily your friend.Running on the JVM is not necessarily good - do you trust all the other java code?     i don't. trust _me_, i've been there
  • #40: The Six Essential Characteristics of a Fault Tolerant System
  • #41: Let it Crash
  • #43: Mnesia, ETS, gen_servers, etc.
  • #44: Testing is infinitely easier. Trivial to simulate (its all messages!)Thank you immutability!
  • #45: Testing is infinitely easier. Trivial to simulate (its all messages!)Thank you immutability!
  • #46: The bigger they are, the harder they fall
  • #51: Just connect to a remote node and trace to figure out what is going on
  • #52: Why wait? Just log on to a node
  • #53: Soft real-time. Brief discussion of instrumentation and ‘reductions’
  • #54: i/o (and message passing. basically the same thing) is _wicked_ fast. Not just IPC, but network, web (cowboy) websockets, etc.
  • #55: The Buddha nature of erlang
  • #56: This is pretty much what we’re talking about right?Systems – Development/Production and Internal/External
  • #57: Its not just us
  • #58: Its not just us
  • #59: Lets talk about systems
  • #60: The Six Essential Characteristics of a Fault Tolerant System
  • #61: Loose Coupling, of course, gives us all these benefits
  • #62: Loose Coupling, of course, gives us all these benefitsLoosely couple systems can operate concurrently. Well D-UHErrors can be contained/constrained
  • #63: Keep components/modules/systems ‘loosely coupled’Connect via specs/apis/busesDo this by default, even when you don’t need to!
  • #65: Builds trust  Trust in the stupidity of people, trust that things will fail, trust that you will be affected
  • #67: The amount of brainpower we have is limited.Reduce complexity by being able to focus on specific / limited areas
  • #68: There are many studies (some not so controversial) that show the number of bugs/line is constantFocus on smaller areas gives you fewer things to tackle
  • #69: Isn’t Performance an issue w/ Loose Coupling?
  • #70: remember the bit about failure? well, why optimize if you're going to fail anyhow? yeah yeah, you might fail because you don't perform, but that is rarely the problem
  • #71: yes, that mine craft plugin you built might gt a million signupsit won'tseriously – it doesn't register statistically
  • #74: DashboardsOtherwise, how do you know whats going on?
  • #75: Out of band access Don’t rely on the system to always tell you whats happening
  • #76: Corresponds to how we think, and helps deal with edge-cases much *much* better!
  • #77: Be PolyglotEverything fails – even erlang. (noooo)
  • #79: Why Polyglot?Because you want to limit your failure modes (increasing diversity can actually reduce systemic risk)
  • #80: Macro Effects Matter! Systems span divisionsFinance, Customer Support, Sales, HR, etc.
  • #81: Helmuth vonMoltke
  • #83: People fall ill
  • #84: Vendors Fail(Amazon)
  • #85: Fraud: You wonder why your CFO is in Brazil…
  • #86: Tail Risk (Things that can never happen)This deserves its own section(financial crisis)
  • #87: Ask yourself this. Over and over again…
  • #88: The Six Essential Characteristics of a Fault Tolerant System
  • #89: Yeah, yeah. Understandable lies. But the bottlenecks are pretty far down the road (and much further than you would have gotten before!)
  • #90: Tail RiskThis deserves its own section(financial crisis)
  • #91: How fast are you?How quickly can you come back up? Can you store enough state to survive?
  • #92: Is BufferBloat a problem?
  • #93: Once you are up, can you draw down the queue fast enough?Or at all, for that matter?
  • #94: Is backpressure going to be a problem?
  • #95: If the answer is “Yes”, then the talk is over, because it just works.
  • #96: What if the answer is “No”? (Now we have a story)
  • #97: ProgrammableIf you’re lucky, you’re infrastructure will automagically support ramping
  • #98: Fake it. People respond subconsciously to these, and actually waitYou can even get away with dropping the request(This assumes that you can recover in time)
  • #99: This happens inside the airport too!Passengers self-select the best gates to enter(intelligent routing)
  • #100: The question is, what do you do when you can’t come up in time? 3 gallon bucket, 5 gallons of water…
  • #101: Just start dropping when queue fills upThis is pretty bad – global synchronization becomes a problemPlanes don’t take off till they get clearance from the other end
  • #102: Slow Start, AQM, RED, CoDEL, …Why don’t we learn from networks?They certainly don’t learn from us, why do we ignore them?
  • #103: RED / SRED(RED in a different light – toilet bowl)
  • #104: RED / SRED(RED in a different light – toilet bowl)
  • #105: The 3rd priority airport always gets the shaft
  • #106: F(low) REDRED on a per-flow basis (the entire route map)Kinda the default. Discard second request)
  • #107: RED – P(referential) D(rop)Does RED only for High BW flows (high traffic routes)(Throttle spammy clients. Or features.)
  • #108: W(eighted) REDDifferent discard probabilities for different flows (translatlantic routes)(Major clients vs small ones0
  • #109: S(tabilized) RED – estimate flows and probabilitiesR(obust) RED – Protect against low-rate DoS (with filters) (even unintentional DoS)A(daptive) RED – Modify prob based on queue CHO(ose and) K(eep) or CHO(ose and) K(ill) - open for < min;  drop tail for > maxelse, compare packet to random packet. if same flow, drop it w/ prob.
  • #110: Fixed two bugs in REDMade it feedback based (self-tuning)Toilet diagram caused problems
  • #111: Van Jacobson strikes backUse Queue length as metric (bursts can fill up queue)Drop probabilistically
  • #112: Yeah, yeah. Understandable lies. But the bottlenecks are pretty far down the road (and much further than you would have gotten before!)