SlideShare a Scribd company logo
Distributed
Systems
One
Lesson
in
@tlberglund
What	
  are ?Distributed	
  Systems
Any	
  system	
  too	
  large	
  to	
  fit	
  
on	
  one	
  computer.
What	
  are
• Storage	
  
• Transac6ons	
  
• Computa6on	
  
• Coordina6on
?Distributed	
  Systems
Distributed	
  Storage
Distributed	
  Storage
• Read	
  replica6on	
  
• Sharding	
  
• Consistent	
  hashing	
  
• Distributed	
  filesystems
Distributed	
  Storage
• MongoDB	
  
• Cassandra	
  
• HDFS
When	
  Life	
  is	
  Good
When	
  Life	
  is	
  Good
When	
  Life	
  is	
  Good
Tim:Americano
When	
  Life	
  is	
  Good
Americano
Tim?
When	
  Life	
  is	
  Good
When	
  Life	
  is	
  Busier
When	
  Life	
  is	
  Busier
Tim:Triple	
  Skinny
When	
  Life	
  is	
  Busier
Tim:Triple	
  SkinnyTim:Triple	
  Skinny
When	
  Life	
  is	
  Busier
Tim:Triple	
  Skinny
Tim:Triple	
  Skinny
When	
  Life	
  is	
  Busier
Tim?
ReplicaCon	
  Problems
• Complexity	
  
• Consistency
When	
  Life	
  is	
  REALLY	
  busy
Aaron-­‐	
  
Faye
When	
  Life	
  is	
  REALLY	
  busy
Aaron-­‐	
  
Faye
Faye-­‐	
  
Nancy
When	
  Life	
  is	
  REALLY	
  busy
Aaron-­‐	
  
Faye
Faye-­‐	
  
Nancy
Nancy-­‐	
  
Zed
Sharding	
  Problems
• More	
  Complexity	
  
• Broken	
  data	
  model	
  
• Limited	
  data	
  access	
  paEerns
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
Tim:Americano
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
9F72:Americano
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
Tim?
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
9F72?
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
9F72:Americano
Consistent	
  Hashing
2000
4000
6000
8000
A000
C000
E000
0000
CAP	
  Theorem
C
PA
CAP	
  Theorem
• Consistency	
  
• Availability	
  
• Par66on	
  Tolerance
CAP	
  Theorem
• Shared	
  wri6ng	
  project	
  
• Coffee	
  shop	
  closes	
  
• Synchronizing	
  over	
  the	
  phone	
  
• BaEery	
  dies	
  
• Status	
  report!
CAP	
  Theorem
C
PA
Distributed	
  TransacCons
Distributed	
  TransacCons
ACID	
  TransacCons
• Atomic	
  
• Consistent	
  
• Isolated	
  
• Durable	
  
• One	
  barista	
  
• Mul6ple	
  baristas
Ordering	
  Coffee
• Receive	
  order	
  
• Process	
  payment	
  
• Enqueue	
  order	
  
• Make	
  coffee	
  
• Deliver	
  drink
Ordering	
  Coffee
• Receive	
  order	
  
• Process	
  payment	
  
• Enqueue	
  order	
  
• Make	
  coffee	
  
• Deliver	
  drink
{Different
Actors
Ordering	
  Coffee
• Why	
  split	
  up	
  order	
  processing?	
  
• What	
  can	
  fail?	
  
• What	
  are	
  the	
  consequences	
  of	
  failure?	
  
• How	
  do	
  we	
  repair	
  failure?
Ordering	
  Coffee
• How	
  can	
  we	
  design	
  a	
  coffee	
  shop	
  with	
  
atomic	
  transac6ons?	
  
• How	
  does	
  that	
  limit	
  the	
  business?	
  
• Why	
  give	
  up	
  atomicity?
Distributed	
  ComputaCon
Distributed	
  ComputaCon
• ScaEer/Gather	
  
• MapReduce	
  
• Hadoop	
  
• Spark
MapReduce
• All	
  computa6on	
  in	
  two	
  func6ons:	
  Map	
  and	
  Reduce	
  
• Keep	
  data	
  (mostly)	
  where	
  it	
  is	
  
• Move	
  compute	
  to	
  data
MapReduce
Mapper
Reducer
(k ,v) [(k ,v), (k ,v), (k ,v), …]
Shuffle(k , [v, v, v, …])
[(k ,v), (k ,v), (k ,v), …]
MapReduce
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
"'Tis some visitor," I muttered, "tapping at my chamber door-
Only this, and nothing more."
Ah, distinctly I remember it was in the bleak December,
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow;- vainly I had sought to borrow
poems/raven.txt:
MapReduce
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."
"Be that word our sign in parting, bird or fiend," I shrieked, upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my door!"
Quoth the Raven, "Nevermore."
And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamp-light o'er him streaming throws his shadow on the floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!
poems/raven.txt:
MapReduce
Once
upon
a
midnightdrearywhile
pondered
suddenly
there
came a
tapping
gently
rapping
rapping
tapping
at
my
chamber
door
Map
MapReduce
Once:1
upon:1
a:1
midnight:1dreary:1while:1
pondered:1
suddenly:1
there:1
came:1 a:1
tapping:1
gently:1
rapping:1
rapping:1
tapping:1
at:1
my:1
chamber:1
door:1
Map
MapReduce
Once:1
upon:1a:1
midnight:1
dreary:1while:1
pondered:1suddenly:1 there:1came:1a:1
tapping:1gently:1
rapping:1
rapping:1 tapping:1
at:1
my:1chamber:1
door:1
Shuffle
MapReduce
Once:[1]
upon:[1]
midnight:[1]
dreary:[1]while:[1]
pondered:[1]suddenly:[1] there:[1]came:[1]a:[1,1]
gently:[1]
rapping:[1,1]
tapping:[1,1]
at:[1]
my:[1]chamber:[1]
door:[1]
Reduce
MapReduce
Once:1
upon:1
midnight:1
dreary:1while:1
pondered:1suddenly:1 there:1came:1a:2
gently:1
rapping:2
tapping:2
at:1
my:1chamber:1
door:1
Reduce
Real-­‐World	
  MapReduce
• Hadoop	
  
• Cloudera,	
  Hortonworks,	
  MapR	
  
• Nobody	
  write	
  map,	
  reduce	
  func6ons	
  
• Hive	
  (SQL-­‐like	
  interface)	
  
• Integra6on	
  with	
  BI	
  front-­‐ends
Hadoop
• MapReduce	
  API	
  
• Job	
  Tracker,	
  Task	
  Tracker	
  
• Distributed	
  Filesystem	
  (HDFS)	
  
• Enormous	
  ecosystem
Spark
Spark
• ScaEer/gather	
  paradigm	
  (similar	
  to	
  MapReduce)	
  
• More	
  general	
  data	
  model	
  (RDDs)	
  
• More	
  general	
  programming	
  model	
  (transform/ac6on)	
  
• Storage	
  agnos6c
Spark	
  Architecture
2000
4000
6000
8000
A000
C000
E000
0000
Spark
Client Spark
Context
Driver
Spark
Client Spark
ContextJob
Spark	
  Architecture
Spark
Client Spark
Context
Job
Cluster
Manager
Spark	
  Architecture
Spark
Client Spark
Context
Cluster
Manager
JobTaskTaskTaskTask
Spark	
  Architecture
Spark
Client Spark
Context
Job
Task
Task
Task
Task
Spark	
  Architecture
Cluster
Manager
Executor
Task Task
Cache
Spark	
  Worker	
  Node
Executor
Task Task
Cache
Executor
Task Task
CacheRDD RDD
RDD
RDD
RDD
RDD
Spark	
  Worker	
  Node
What’s	
  an	
  RDD?
What’s	
  an	
  RDD?
This is
What’s	
  an	
  RDD?
• Bigger	
  than	
  a	
  computer	
  
• Read	
  from	
  an	
  input	
  source	
  
• Output	
  of	
  a	
  pure	
  func6on	
  
• Immutable	
  
• Typed	
  
• Ordered	
  
• Lazily	
  evaluated	
  
• Par66oned	
  
• Collec6on	
  of	
  things
Distributed,	
  how?
5444 5676 0686 8389:
List(xn-1, xn-2, xn-3)
4532 4569 7030 1191:
List(xn-4, xn-5, xn-6)
5444 5607 6517 6027:
List(xn-7, xn-8, xn-9)
4532 4577 0122 2189:
Hash
these
Distributed,	
  how?
Distributed	
  CoordinaCon
Distributed	
  CoordinaCon
• NTP	
  
• Epidemic	
  Protocols	
  
• Paxos
NTP
• Distributed	
  6me	
  synchroniza6on	
  
• A	
  network	
  of	
  6ered	
  clocks	
  
• Synchroniza6on	
  algorithm	
  
• O`en	
  accurate	
  to	
  ±10ms
Gossip
2000
4000
6000
8000
A000
C000
E000
0000
Paxos
• Distributed	
  log	
  protocol	
  
• Assumes	
  distributed,	
  unreliable	
  nodes	
  
• Performs	
  leader	
  elec6on
Distributed Systems In One Lesson
Distributed Systems In One Lesson
Review
• Storage	
  
• Transac6ons	
  
• Computa6on	
  
• Coordina6on
Thank
You!
@tlberglund

More Related Content

What's hot (20)

PPTX
Securing Data in Hadoop at Uber
DataWorks Summit
 
PPTX
Hadoop Query Performance Smackdown
DataWorks Summit
 
PPT
Schemaless Databases
Dan Gunter
 
PPTX
Apache hive
pradipbajpai68
 
PPTX
Hadoop And Their Ecosystem
sunera pathan
 
PDF
Field Report (Application of Remote Sensing- Land Use Land Cover Mapping)
Rakib ul Hasan
 
PPTX
Relational and non relational database 7
abdulrahmanhelan
 
PPT
Satellite image processing
alok ray
 
PDF
Python Programming and GIS
John Reiser
 
PDF
Real-time Analytics with Apache Flink and Druid
Jan Graßegger
 
PDF
How to Make a Data Governance Program that Lasts
DATAVERSITY
 
PDF
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
PDF
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
 
PDF
Apache Spark with Scala
Fernando Rodriguez
 
PPTX
Spark
Koushik Mondal
 
PDF
Databricks Delta Lake and Its Benefits
Databricks
 
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
PPTX
Big data ppt
Thirunavukkarasu Ps
 
PDF
Cloud Data Warehouses
Asis Mohanty
 
Securing Data in Hadoop at Uber
DataWorks Summit
 
Hadoop Query Performance Smackdown
DataWorks Summit
 
Schemaless Databases
Dan Gunter
 
Apache hive
pradipbajpai68
 
Hadoop And Their Ecosystem
sunera pathan
 
Field Report (Application of Remote Sensing- Land Use Land Cover Mapping)
Rakib ul Hasan
 
Relational and non relational database 7
abdulrahmanhelan
 
Satellite image processing
alok ray
 
Python Programming and GIS
John Reiser
 
Real-time Analytics with Apache Flink and Druid
Jan Graßegger
 
How to Make a Data Governance Program that Lasts
DATAVERSITY
 
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
 
Apache Spark with Scala
Fernando Rodriguez
 
Databricks Delta Lake and Its Benefits
Databricks
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
Big data ppt
Thirunavukkarasu Ps
 
Cloud Data Warehouses
Asis Mohanty
 

Viewers also liked (20)

PPTX
The Future of Learning: Five trends that could change the face of Indian educ...
Jeremy Williams
 
PPT
Zamirler
yardimt
 
PDF
Seminar Explosives Safety Planner Community Development and Sustainment (Pap...
California Wildlife Conservation Board
 
PPTX
Art 245 photo montage
Joseph DeLappe
 
PPT
Szck Anlam Test 2
yardimt
 
PPT
Zamanlarina GöRe Fiiller
yardimt
 
PPT
Writing presentation bob wilson - dec 2010
Ed Ingman
 
PPS
Anlatim BozukluğU
yardimt
 
PPT
Ekler
yardimt
 
PPT
「雑談コミュニケーションのススメ(インド人編)」佐藤 基裕
toRuby
 
PPT
Technology innovation in legal industry by inszoom
Sneh Sharma
 
PPT
Udl Book Builder For Teachers
lezlieharris
 
PDF
Semantic Geodemography and Urban interoperability
Antonia Chávez-González
 
PDF
Social media marketing per gNe
Fabrizio Faraco
 
PPTX
Comenius garden presentasjon
Eva Rekkedal
 
PDF
TPCR 2013
Ankur Gupta
 
PPS
CüMle Bilgisi
yardimt
 
PPT
Corporate Lessons
sasgharhusain
 
PPTX
Developing a Personal Learning Network (And Using it for Professional Purposes)
Jeremy Williams
 
PPS
Citate haioase de la Jocurile Olimpice
Carla Alman
 
The Future of Learning: Five trends that could change the face of Indian educ...
Jeremy Williams
 
Zamirler
yardimt
 
Seminar Explosives Safety Planner Community Development and Sustainment (Pap...
California Wildlife Conservation Board
 
Art 245 photo montage
Joseph DeLappe
 
Szck Anlam Test 2
yardimt
 
Zamanlarina GöRe Fiiller
yardimt
 
Writing presentation bob wilson - dec 2010
Ed Ingman
 
Anlatim BozukluğU
yardimt
 
Ekler
yardimt
 
「雑談コミュニケーションのススメ(インド人編)」佐藤 基裕
toRuby
 
Technology innovation in legal industry by inszoom
Sneh Sharma
 
Udl Book Builder For Teachers
lezlieharris
 
Semantic Geodemography and Urban interoperability
Antonia Chávez-González
 
Social media marketing per gNe
Fabrizio Faraco
 
Comenius garden presentasjon
Eva Rekkedal
 
TPCR 2013
Ankur Gupta
 
CüMle Bilgisi
yardimt
 
Corporate Lessons
sasgharhusain
 
Developing a Personal Learning Network (And Using it for Professional Purposes)
Jeremy Williams
 
Citate haioase de la Jocurile Olimpice
Carla Alman
 
Ad

More from Tim Berglund (10)

PDF
NoSQL Smackdown!
Tim Berglund
 
PDF
Decision Making in Software Teams
Tim Berglund
 
PDF
Then our buildings shape us 10 minutes
Tim Berglund
 
PDF
Complexity Theory and Software Development
Tim Berglund
 
KEY
Gaelyk: Lightweight Groovy on the Google App Engine
Tim Berglund
 
KEY
Slaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Tim Berglund
 
KEY
Test First Refresh Second: Test-Driven Development in Grails
Tim Berglund
 
KEY
Test First, Refresh Second: Web App TDD in Grails
Tim Berglund
 
KEY
Agile Database Development with Liquibase
Tim Berglund
 
KEY
Database Refactoring With Liquibase
Tim Berglund
 
NoSQL Smackdown!
Tim Berglund
 
Decision Making in Software Teams
Tim Berglund
 
Then our buildings shape us 10 minutes
Tim Berglund
 
Complexity Theory and Software Development
Tim Berglund
 
Gaelyk: Lightweight Groovy on the Google App Engine
Tim Berglund
 
Slaying The Legacy Dragon: Practical Lessons in Replacing Old Software
Tim Berglund
 
Test First Refresh Second: Test-Driven Development in Grails
Tim Berglund
 
Test First, Refresh Second: Web App TDD in Grails
Tim Berglund
 
Agile Database Development with Liquibase
Tim Berglund
 
Database Refactoring With Liquibase
Tim Berglund
 
Ad

Recently uploaded (20)

PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 

Distributed Systems In One Lesson