Resiliency in
Distributed
Systems
Dinesh Kumar
Rajeev N B
Agenda
• Basics of Resilience and Distributed Systems
• Why we need Resilience ?
• How do we achieve Resilience ?
• Faults vs Failures
• Measures in designing Resilient systems
Distributed Systems
Networked Components communicate and
coordinate their actions by passing messages
Same codebase different modules (monolith)
Different Codebase Same Database
Different Codebase Its own Databases
Different Codebase (with duplication or redundancy) Its own
Databases
Resiliency
The capacity to recover from difficulties
Why do we care about
Resilience ?
How do we build a
Resilient System ?
Fault vs Failure
Fault
Incorrect Internal state in your system
Faults
• Go routine leaks
• File Descriptor Leaks (not closing response body, file, ....
)
• Memory Leaks
• Thread Pool Bottleneck
• Network issues
• Contract Issues
• Throughput increase / attacks
Failure
Inability of the system to do its intended Job
Resiliency in Distributed Systems
Fault Tolerance
Patterns for Resiliency
Production Readiness
Checklist
Production Readiness Checklist
● Logging
● System Monitoring
● Application Monitoring
● Custom metrics (Business specific)
● Circuit Breakers
● Rate-limit critical flows
● Load Testing
● Performance Testing
● Runbooks
● Timeouts/Retries
● Code quality metrics
● Recovery Handler (In GO)
Problem Demo
Aggressive Timeouts
and Retries
Circuit Breakers
http://www.ebaytechblog.com/2015/09/08/application-resiliency-using-netflix-
hystrix/
Code Demo
Fallbacks
Redundancies
Resiliency Testing -
Simian Army
Clusters of our network duplicated
Simian Army
• Chaos Monkey
• Latency Monkey
• Chaos Gorilla
Metrics and
Monitoring
Monitoring
• Application
• System
• Customer Business Metrics
• Request Monitoring
Resiliency in Distributed Systems
Resiliency in Distributed Systems
Resiliency in Distributed Systems
Thanks
Questions ?
Feedbacks / Reach out
Github: https://github.com/dineshkumar-cse
Mail: dineshkumar_cse@hotmail.com
Github: https://github.com/rshetty
mail: rajeev.bharshetty@go-jek.com

More Related Content

PDF
Windows-Server-2022-Courseware.pdf......
PPTX
Windows Server 2019.pptx
PDF
Arista Networks - Building the Next Generation Workplace and Data Center Usin...
PDF
Microsoft Windows Server 2022 Overview
PPT
Cloud Application Development Lifecycle
PPTX
Virtualization
PPT
Domain Migration/Administration for the
PDF
Dns server configuration
Windows-Server-2022-Courseware.pdf......
Windows Server 2019.pptx
Arista Networks - Building the Next Generation Workplace and Data Center Usin...
Microsoft Windows Server 2022 Overview
Cloud Application Development Lifecycle
Virtualization
Domain Migration/Administration for the
Dns server configuration

What's hot (20)

PPTX
E tech vmware presentation
PPTX
Elastic Compute Cloud (EC2) on AWS Presentation
PPTX
MemVerge: Past Present and Future of CXL
PDF
Operating systems essentials & Android OS concepts
PDF
PPTX
Green Data Center
PPTX
Let's Talk About: Azure Networking
PPTX
AWS Cloud organizations presentation
PPTX
Basic Server PPT (THDC)
PDF
AWS EBS
PPTX
Windows server
PPTX
VMware virtual SAN 6 overview
PDF
Gen8-v-Gen9 HP Server
PDF
Microsoft Azure
PPTX
Serverless Architecture
PPT
Active directory
PDF
tcp cloud - Advanced Cloud Computing
PDF
OpenStack-ansibleで作るOpenStack HA環境 Mitaka版
PDF
Network monitoring tools
PPTX
INTEL’S MMX TECHNOLOGY FOR ENHANCED PROCESSOR
E tech vmware presentation
Elastic Compute Cloud (EC2) on AWS Presentation
MemVerge: Past Present and Future of CXL
Operating systems essentials & Android OS concepts
Green Data Center
Let's Talk About: Azure Networking
AWS Cloud organizations presentation
Basic Server PPT (THDC)
AWS EBS
Windows server
VMware virtual SAN 6 overview
Gen8-v-Gen9 HP Server
Microsoft Azure
Serverless Architecture
Active directory
tcp cloud - Advanced Cloud Computing
OpenStack-ansibleで作るOpenStack HA環境 Mitaka版
Network monitoring tools
INTEL’S MMX TECHNOLOGY FOR ENHANCED PROCESSOR
Ad

Similar to Resiliency in Distributed Systems (20)

PDF
Resisting to The Shocks
PDF
Why resilience - A primer at varying flight altitudes
PDF
Going Resilient...
PDF
The 7 quests of resilient software design
PPSX
Foult Tolerence In Distributed System
PDF
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
PDF
Without Resilience, Nothing Else Matters
PDF
Introduction-to-Distributed-Systems Introduction-to-Distributed-Systems.pdf
PDF
dist_systems.pdf
PPTX
Resilience engineering
PDF
Software Availability by Resiliency
PDF
System Resilience - Race to recover
PPTX
Resilience reloaded - more resilience patterns
PDF
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
PPTX
Building the Resilient Grid NRECA SFA
PPTX
Anotherpptjasldkfjlaskdjflkajsdlfkjasdlfkjasldkfj.pptx
PPTX
Resilience planning and how the empire strikes back
PPT
Introduction to Distributed Systems
PPTX
Fault Tolerance in Distributed System
PPTX
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Resisting to The Shocks
Why resilience - A primer at varying flight altitudes
Going Resilient...
The 7 quests of resilient software design
Foult Tolerence In Distributed System
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Without Resilience, Nothing Else Matters
Introduction-to-Distributed-Systems Introduction-to-Distributed-Systems.pdf
dist_systems.pdf
Resilience engineering
Software Availability by Resiliency
System Resilience - Race to recover
Resilience reloaded - more resilience patterns
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Building the Resilient Grid NRECA SFA
Anotherpptjasldkfjlaskdjflkajsdlfkjasdlfkjasldkfj.pptx
Resilience planning and how the empire strikes back
Introduction to Distributed Systems
Fault Tolerance in Distributed System
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Ad

More from Rajeev Bharshetty (6)

PDF
Production Ready Microservices at Scale
PDF
Redux - What is this fuss about ?
PDF
Writing S.O.L.I.D Code
PDF
FunctionalGeekery-RubyConf
Production Ready Microservices at Scale
Redux - What is this fuss about ?
Writing S.O.L.I.D Code
FunctionalGeekery-RubyConf

Recently uploaded (20)

PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PPTX
introduction to high performance computing
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PPTX
communication and presentation skills 01
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
Design Guidelines and solutions for Plastics parts
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
Software Engineering and software moduleing
PDF
III.4.1.2_The_Space_Environment.p pdffdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
August -2025_Top10 Read_Articles_ijait.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Fundamentals of safety and accident prevention -final (1).pptx
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
August 2025 - Top 10 Read Articles in Network Security & Its Applications
introduction to high performance computing
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Exploratory_Data_Analysis_Fundamentals.pdf
Management Information system : MIS-e-Business Systems.pptx
communication and presentation skills 01
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Fundamentals of Mechanical Engineering.pptx
Design Guidelines and solutions for Plastics parts
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
distributed database system" (DDBS) is often used to refer to both the distri...
Software Engineering and software moduleing
III.4.1.2_The_Space_Environment.p pdffdf

Resiliency in Distributed Systems