SlideShare a Scribd company logo
Aysylu Greenberg
June 14, 2016
Distributed Systems in Practice,
in Theory
How I got into reading
papers as a
practitioner in industry
Computer Science
Research
In
Distributed Systems
Industry
Operating systems research
Operating systems research
Operating systems research
Concurrency
Operating systems research
Concurrency
Concurrency primitives:
mutex & semaphore
Operating systems research
Concurrency
Concurrency primitives:
mutex & semaphore
Processes execute at
different speeds
Time in distributed systems
https://www.flickr.com/photos/national_archives_of_norway/6263353228
Time in distributed systems
Time in distributed systems
Pipelining
1980
1980
Internet
1980
Internet
Distributed consensus
1980
Internet
Distributed consensus
1980
Internet
Distributed consensus
1980
Paxos
Internet
Distributed consensus
1980
Reconsider large systems
Reconsider large systems
Shared infrastructure
...
CS Research is Timeless
Inform decisions
Mitigate technical risk
* 2
2
Aysylu Greenberg
@aysylu22
Papers We Love NYC
Papers We Love SF
* 2
5
Aysylu Greenberg
@aysylu22
Today
● Staged Event-Driven Architecture
Today
● Staged Event-Driven Architecture
● Leases
Today
● Staged Event-Driven Architecture
● Leases
● Inaccurate Computations
Staged Event
Driven
Architecture
&
Deep
Pipelines
2001
Hardware to Data Pipelines
Hardware to Data Pipelines
https://en.wikipedia.org/wiki/Graphics_pipeline
QCon NYC: Distributed systems in practice, in theory
Staged Event Driven Architecture
Staged Event Driven Architecture
+ -
Single-machine pipeline
generalizes to distributed pipelines
Staged Event Driven Architecture
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
Search Indexing Pipelines
+ -
Leases
as Heart Beat in
Distributed
Systems
1989
QCon NYC: Distributed systems in practice, in theory
Leases
● Distributed locking
Leases
● Distributed locking
● Lease term tradeoffs
○ short
Leases
● Distributed locking
● Lease term tradeoffs
○ short vs long
Leases
● Distributed locking
● Lease term tradeoffs
○ short vs long
● Use of leases in modern applications
○ Leader election TTL (in etcd)
Leases
● Distributed locking
● Lease term tradeoffs
○ short vs long
● Use of leases in modern applications
○ Leader election TTL (in etcd)
○ Liveness detection
QCon NYC: Distributed systems in practice, in theory
Leases in Build System:
Success Scenario
Build my project
Build
System
Build my project
Build
System
OK
Build my project
Build
System
OK
Waiting for the results
Build my project
Build
System
OK
Waiting for the results
Build is in progress
Build my project
Build
System
OK
Waiting for the results
Build is in progress
Waiting for the results
Build my project
Build
System
OK
Waiting for the results
Build is in progress
Waiting for the results
Build is finished
Leases in Build System:
Failure Scenario
Leases in Build System
Leases in Build System
Leases in Build System
Leases in Build System
Leases in Build System
Leases in Build System
Using etcd leases for heartbeat
$ curl http://server.com/v2/keys/foo -XPUT -d
value=bar -d ttl=300
{
"action": "set",
"node": {
"createdIndex": 2,
"expiration":"2016-06-14T16:15:00",
"key": "/foo",
"modifiedIndex": 2,
"ttl": 300,
"value": "bar"
}
}
Using etcd leases for heartbeat
$ curl http://server.com/v2/keys/foo -XPUT -d 
value=bar -d ttl=300
… 3 minutes later...
Using etcd leases for heartbeat
$ curl http://server.com/v2/keys/foo -XPUT -d 
value=bar -d ttl=300
$ curl 
http://server.com/v2/keys/foo?prevValue=bar 
-XPUT -d ttl=300 -d refresh=true -d 
prevExist=true
{
"action": "update",
"node": {
"createdIndex": 2,
"expiration":"2016-06-14T16:18:00",
"key": "/foo",
"modifiedIndex": 3,
"ttl": 300,
"value": "bar"
}
"prevNode": {...}
}
{
"action": "update",
"node": {
"createdIndex": 2,
"expiration":"2016-06-14T16:18:00",
"key": "/foo",
"modifiedIndex": 3,
"ttl": 300,
"value": "bar"
}
"prevNode": {...}
}
"prevNode": {
"createdIndex": 2,
"expiration":"2016-06-14T16:15:00",
"key": "/foo",
"modifiedIndex": 2,
"ttl": 120,
"value": "bar"
}
Leases for heartbeat:
How long should the lease term be?
Inaccurate Computations
&
Serving Search Results
From Accurate to "Good Enough"
[Trade off] Inaccuracy for Performance
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
[Trade off] Inaccuracy for Resilience
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
Reduce
Map
Input
Map
Input
Map
Input
Inaccuracy for Resilience
1. Task decomposition
QCon NYC: Distributed systems in practice, in theory
Inaccuracy for Resilience
1. Task decomposition
2. Baseline for correctness
QCon NYC: Distributed systems in practice, in theory
Inaccuracy for Resilience
1. Task decomposition
2. Baseline for correctness
3. Criticality Testing
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
Inaccuracy for Resilience
1. Task decomposition
2. Baseline for correctness
3. Criticality Testing
4. Distortion and timing models
Distortion Model
Timing Model
[In production]
Inaccuracy for Performance & Resilience
Jeff Dean "Building Software Systems at Google and Lessons Learned", Stanford, 2010
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
[Designing with]
Inaccuracy for Performance & Resilience
[Designing with]
Inaccuracy for Performance & Resilience
simplified implementation
focus on observability
applicable to
some problem
domains
[Designing with]
Inaccuracy for Performance & Resilience
fuzz testing
generative testing
simplified implementation
fault injection testing
focus on observability
applicable to
some problem
domains
References
● T. Wurthinger, C. Wimmer et al. "One VM to Rule Them
All"
● M. Rinard "Probabilistic Accuracy Bounds for Fault-
Tolerant Computations that Discard Tasks"
● F. Corbato, M. Daggett, R. Daley "An Experimental Time-
Sharing System"
● E. Dijkstra "Cooperating Sequential Processes"
● L. Lamport "Time, Clocks, and the Ordering of Events in a
Distributed System"
● http://blinkdb.org/
References
● B. Oki, B. Liskov "Viewstamped Replication: A New Primary Copy
Method to Support Highly-Available Distributed Systems"
● L. Lamport "The Part-Time Parliament"
● M. Welsh, D. Culler, E. Brewer "SEDA: An Architecture for Well-
Conditioned, Scalable Internet Services"
● C. Gray, D. Cheriton "Leases: An Efficient Fault-Tolerant
Mechanism for Distributed File Cache Consistency"
● S. Agarwal, B. Mozafari et al. "BlinkDB: Queries with Bounded
Errors and Bounded Response Times on Very Large Data"
Gratitude
Ines Sombra
David Greenberg
Karan Parikh
Matt Welsh
Erran Berger
Robust & scalable pipelines
Robust & scalable pipelines
Leases for sharing &
heartbeat
Robust & scalable pipelines
Leases for sharing &
heartbeat
Inaccuracy for resilience &
performance
Robust & scalable pipelines
Leases for sharing &
heartbeat
Inaccuracy for resilience &
performance
CS research is timeless:
use it to mitigate risk
Aysylu Greenberg
June 14, 2016
Distributed Systems in Practice,
in Theory
@aysylu22

More Related Content

What's hot (20)

PDF
Netflix Open Source Meetup Season 3 Episode 2
aspyker
 
PDF
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
PDF
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Datadog
 
PDF
Netflix and Containers: Not A Stranger Thing
aspyker
 
PDF
Monitoring kubernetes across data center and cloud
Datadog
 
PDF
Netflix oss season 1 episode 3
Ruslan Meshenberg
 
PPTX
GCP for AWS Professionals
DoiT International
 
PDF
20140708 - Jeremy Edberg: How Netflix Delivers Software
DevOps Chicago
 
PDF
KubeCon US 2021 - Recap - DCMeetup
Faheem Memon
 
PDF
Neutron high availability open stack architecture openstack israel event 2015
Arthur Berezin
 
PDF
Building a Distributed Build System at Google Scale
Aysylu Greenberg
 
PDF
Distributed Locking in Kubernetes
Rafał Leszko
 
PDF
Architectural caching patterns for kubernetes
Rafał Leszko
 
PDF
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
wallyqs
 
PDF
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg
 
PDF
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Alex Robinson
 
PPTX
Arc305 how netflix leverages multiple regions to increase availability an i...
Ruslan Meshenberg
 
PPTX
GIDS_what does_cloud-native_mean_anyway?
Grace Jansen
 
PPTX
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Cloud Native Day Tel Aviv
 
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
Netflix Open Source Meetup Season 3 Episode 2
aspyker
 
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Datadog
 
Netflix and Containers: Not A Stranger Thing
aspyker
 
Monitoring kubernetes across data center and cloud
Datadog
 
Netflix oss season 1 episode 3
Ruslan Meshenberg
 
GCP for AWS Professionals
DoiT International
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
DevOps Chicago
 
KubeCon US 2021 - Recap - DCMeetup
Faheem Memon
 
Neutron high availability open stack architecture openstack israel event 2015
Arthur Berezin
 
Building a Distributed Build System at Google Scale
Aysylu Greenberg
 
Distributed Locking in Kubernetes
Rafał Leszko
 
Architectural caching patterns for kubernetes
Rafał Leszko
 
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
wallyqs
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg
 
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Alex Robinson
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Ruslan Meshenberg
 
GIDS_what does_cloud-native_mean_anyway?
Grace Jansen
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Cloud Native Day Tel Aviv
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 

Viewers also liked (16)

PPTX
[Action Lab] 2/4 Teaching coding and computer science: a test case for OER
Open Education Global (OEGlobal)
 
DOC
Poonam Shelke SAP PP QM Consultant
Poonam Shelke
 
PPTX
Spy Gadgets Shop USA
Shop Spy Camera in USA
 
PPTX
20th century tigers in 21st century jungle
Milton Papadakis
 
PPTX
Emotions amsterdam 2016
POLIS LSE
 
PDF
Codigo de Trabajo del Ecuador
JhonUr
 
PPTX
SEAMOLEC's MOOC - Dr Abi Sujak, M.Sc
Elvin Khoirunnisa
 
PPTX
Strategic Planning Process (SPP) model for developing open educational resour...
Ava Chen
 
PPTX
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Aysylu Greenberg
 
PPTX
Panasonic
tresnsita
 
PPTX
エクセル統計の使い方(困ったとき編)
Social Survey Research Information Co., Ltd.
 
PDF
WMS_Presentation_LQ
Anirudh K.M
 
PPTX
A Tale of Two Globes: Exploring the North/South Divide in Use of OER
OER Hub
 
PPTX
Coffee wars in India : CCd taking on the global brands
sampriti1991
 
PPT
Indus valley
bbednars
 
PPTX
Classical Greece
bbednars
 
[Action Lab] 2/4 Teaching coding and computer science: a test case for OER
Open Education Global (OEGlobal)
 
Poonam Shelke SAP PP QM Consultant
Poonam Shelke
 
Spy Gadgets Shop USA
Shop Spy Camera in USA
 
20th century tigers in 21st century jungle
Milton Papadakis
 
Emotions amsterdam 2016
POLIS LSE
 
Codigo de Trabajo del Ecuador
JhonUr
 
SEAMOLEC's MOOC - Dr Abi Sujak, M.Sc
Elvin Khoirunnisa
 
Strategic Planning Process (SPP) model for developing open educational resour...
Ava Chen
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Aysylu Greenberg
 
Panasonic
tresnsita
 
エクセル統計の使い方(困ったとき編)
Social Survey Research Information Co., Ltd.
 
WMS_Presentation_LQ
Anirudh K.M
 
A Tale of Two Globes: Exploring the North/South Divide in Use of OER
OER Hub
 
Coffee wars in India : CCd taking on the global brands
sampriti1991
 
Indus valley
bbednars
 
Classical Greece
bbednars
 
Ad

Similar to QCon NYC: Distributed systems in practice, in theory (20)

PDF
Distributed systems in practice, in theory (JAX London)
Aysylu Greenberg
 
PDF
Distributed systems in practice, in theory (ScaleConf Colombia)
Aysylu Greenberg
 
PDF
Distributed systems in practice, in theory
Aysylu Greenberg
 
PDF
Distributed Systems in Practice, in Theory
C4Media
 
PDF
Practice and challenges from building IaaS
Shawn Zhu
 
PDF
Architecting for failure - Why are distributed systems hard?
Markus Eisele
 
PPTX
The Architecture of Continuous Innovation - OSCON 2015
Chip Childers
 
PPT
Computing Outside The Box September 2009
Ian Foster
 
PPT
Computing Outside The Box June 2009
Ian Foster
 
PPTX
Google
rpaikrao
 
PPT
ICALEPCS 2011: Testing Environments using Virtualization
Omer Khalid
 
PDF
IEEE Parallel and distributed system 2016 Title and Abstract
tsysglobalsolutions
 
PPTX
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
PPTX
Flexible compute
Peter Clapham
 
PDF
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
PDF
Distributed Systems: scalability and high availability
Renato Lucindo
 
PPT
HDFS_architecture.ppt
vijayapraba1
 
PDF
Docker microservices and the service mesh
Docker, Inc.
 
PDF
From Mainframe to Microservice: An Introduction to Distributed Systems
Tyler Treat
 
PPT
chapter 1 Introduction Distributed System
sppunhan
 
Distributed systems in practice, in theory (JAX London)
Aysylu Greenberg
 
Distributed systems in practice, in theory (ScaleConf Colombia)
Aysylu Greenberg
 
Distributed systems in practice, in theory
Aysylu Greenberg
 
Distributed Systems in Practice, in Theory
C4Media
 
Practice and challenges from building IaaS
Shawn Zhu
 
Architecting for failure - Why are distributed systems hard?
Markus Eisele
 
The Architecture of Continuous Innovation - OSCON 2015
Chip Childers
 
Computing Outside The Box September 2009
Ian Foster
 
Computing Outside The Box June 2009
Ian Foster
 
Google
rpaikrao
 
ICALEPCS 2011: Testing Environments using Virtualization
Omer Khalid
 
IEEE Parallel and distributed system 2016 Title and Abstract
tsysglobalsolutions
 
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Flexible compute
Peter Clapham
 
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
Distributed Systems: scalability and high availability
Renato Lucindo
 
HDFS_architecture.ppt
vijayapraba1
 
Docker microservices and the service mesh
Docker, Inc.
 
From Mainframe to Microservice: An Introduction to Distributed Systems
Tyler Treat
 
chapter 1 Introduction Distributed System
sppunhan
 
Ad

More from Aysylu Greenberg (20)

PDF
Software Supply Chains for DevOps @ InfoQ Live 2021
Aysylu Greenberg
 
PDF
Binary Authorization in Kubernetes
Aysylu Greenberg
 
PDF
Software Supply Chain Management with Grafeas and Kritis
Aysylu Greenberg
 
PDF
Software Supply Chain Observability with Grafeas and Kritis
Aysylu Greenberg
 
PDF
Software Supply Chain Management with Grafeas and Kritis
Aysylu Greenberg
 
PDF
Zero Downtime Migrations at Scale
Aysylu Greenberg
 
PDF
Zero Downtime Migration
Aysylu Greenberg
 
PPTX
PWL Denver: Copysets
Aysylu Greenberg
 
PDF
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
Aysylu Greenberg
 
PDF
(+ Loom (years 2))
Aysylu Greenberg
 
PDF
Probabilistic Accuracy Bounds @ Papers We Love SF
Aysylu Greenberg
 
PDF
Benchmarking (JAXLondon 2015)
Aysylu Greenberg
 
PPTX
Loom & Functional Graphs in Clojure @ LambdaConf 2015
Aysylu Greenberg
 
PDF
Benchmarking (DevNexus 2015)
Aysylu Greenberg
 
PDF
Benchmarking (RICON 2014)
Aysylu Greenberg
 
PDF
Benchmarking: You're Doing It Wrong (StrangeLoop 2014)
Aysylu Greenberg
 
PDF
PWL: One VM to Rule Them All
Aysylu Greenberg
 
PDF
Loom at Clojure/West
Aysylu Greenberg
 
PDF
Clojure class
Aysylu Greenberg
 
PDF
Loom and Graphs in Clojure
Aysylu Greenberg
 
Software Supply Chains for DevOps @ InfoQ Live 2021
Aysylu Greenberg
 
Binary Authorization in Kubernetes
Aysylu Greenberg
 
Software Supply Chain Management with Grafeas and Kritis
Aysylu Greenberg
 
Software Supply Chain Observability with Grafeas and Kritis
Aysylu Greenberg
 
Software Supply Chain Management with Grafeas and Kritis
Aysylu Greenberg
 
Zero Downtime Migrations at Scale
Aysylu Greenberg
 
Zero Downtime Migration
Aysylu Greenberg
 
PWL Denver: Copysets
Aysylu Greenberg
 
MesosCon Asia Keynote: Replacing a Jet Engine Mid-flight
Aysylu Greenberg
 
(+ Loom (years 2))
Aysylu Greenberg
 
Probabilistic Accuracy Bounds @ Papers We Love SF
Aysylu Greenberg
 
Benchmarking (JAXLondon 2015)
Aysylu Greenberg
 
Loom & Functional Graphs in Clojure @ LambdaConf 2015
Aysylu Greenberg
 
Benchmarking (DevNexus 2015)
Aysylu Greenberg
 
Benchmarking (RICON 2014)
Aysylu Greenberg
 
Benchmarking: You're Doing It Wrong (StrangeLoop 2014)
Aysylu Greenberg
 
PWL: One VM to Rule Them All
Aysylu Greenberg
 
Loom at Clojure/West
Aysylu Greenberg
 
Clojure class
Aysylu Greenberg
 
Loom and Graphs in Clojure
Aysylu Greenberg
 

Recently uploaded (20)

PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PPTX
From spreadsheets and delays to real-time control
SatishKumar2651
 
PPTX
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PPTX
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
NPD Software -Omnex systems
omnex systems
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
From spreadsheets and delays to real-time control
SatishKumar2651
 
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
NPD Software -Omnex systems
omnex systems
 

QCon NYC: Distributed systems in practice, in theory