SlideShare a Scribd company logo
SRE Demystified
SRE Practices
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com,
http://ganeshniyer.com
Dr Ganesh Neelakanta Iyer
SRE
•
2https://image.slidesharecdn.com/devopssreatgooglescale-190121123035/95/devops-sre-at-google-scale-30-638.jpg?cb=1548074257
Practices
3
https://landing.google.com/sre/sre-book/chapters/part3/
Monitoring
• Without monitoring, you have no way to tell whether the
service is even working; absent a thoughtfully designed
monitoring infrastructure, you’re flying blind
• Maybe everyone who tries to use the website gets an
error, maybe not—but you want to be aware of problems
before your users notice them
4
https://blog.zabbix.com/zabbix-4-2-out-now/6791/ https://www.youtube.com/watch?v=BPu_0hqHgqA https://www.zabbix.com/
Incident Response
• Being on-call
• Effective troubleshooting
• Emergency response
• Managing incidents
5
Postmortem and Root-Cause Analysis
• The primary goals of writing a postmortem are to ensure
• that the incident is documented,
• that all contributing root cause(s) are well understood,
and,
• that effective preventive actions are put in place to
reduce the likelihood and/or impact of recurrence
• Blameless
6
https://landing.google.com/sre/sre-book/chapters/postmortem-culture/
Testing
• Testing is the mechanism you use to demonstrate specific
areas of equivalence when changes occur
• Each test that passes both before and after a change
reduces the uncertainty for which the analysis needs to
allow
• Thorough testing helps us predict the future reliability of a
given site with enough detail to be practically useful
7
https://landing.google.com/sre/sre-book/chapters/testing-reliability/
Capacity Planning
• Intent-Based Capacity Planning
• Intent is the rationale for how a service owner wants to
run their service
• Moving from concrete resource demands to motivating
reasons in order to arrive at the true capacity planning
intent often requires several layers of abstraction
• Example
• "I want 50 cores in clusters X, Y, and Z for service Foo."
• "I want to run service Foo at 5 nines of reliability."
8
https://landing.google.com/sre/sre-book/chapters/software-engineering-in-sre/
Development
• Distributed Reliability
• Data processing pipelines
• one-shot MapReduce jobs running periodically
• systems that operate in near real-time
• Data Integrity
• What you read is what you write
9
Product
• Finally, having made our way up the reliability pyramid,
we find ourselves at the point of having a workable
product
10
11
References
12
Dr Ganesh Neelakanta Iyer
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com

More Related Content

What's hot (20)

PDF
Kks sre book_ch1,2
Chris Huang
 
PDF
SRE From Scratch
Grier Johnson
 
PDF
What Is Helm
AMELIAOLIVIA2
 
PDF
Microservice Architecture
Nguyen Tung
 
PDF
Building an SRE Organization @ Squarespace
Franklin Angulo
 
PPTX
Site reliability engineering
Jason Loeffler
 
PDF
Devops Devops Devops, at Froscon
Kris Buytaert
 
PPSX
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Araf Karsh Hamid
 
PPTX
Why to Cloud Native
Karthik Gaekwad
 
PDF
How we can do Multi-Tenancy on Kubernetes
Opsta
 
PPTX
Introduction to microservices
Paulo Gandra de Sousa
 
PDF
Kubernetes - A Comprehensive Overview
Bob Killen
 
PPTX
Intro to Helm for Kubernetes
Carlos E. Salazar
 
PDF
Overview of Site Reliability Engineering (SRE) & best practices
Ashutosh Agarwal
 
PDF
Ansible
Rahul Bajaj
 
PPTX
Site reliability engineering - Lightning Talk
Michae Blakeney
 
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
PPTX
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
DevOpsDays Tel Aviv
 
PDF
What's an SRE at Criteo - Meetup SRE Paris
Clément Michaud
 
PPTX
A Crash Course in Building Site Reliability
Acquia
 
Kks sre book_ch1,2
Chris Huang
 
SRE From Scratch
Grier Johnson
 
What Is Helm
AMELIAOLIVIA2
 
Microservice Architecture
Nguyen Tung
 
Building an SRE Organization @ Squarespace
Franklin Angulo
 
Site reliability engineering
Jason Loeffler
 
Devops Devops Devops, at Froscon
Kris Buytaert
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Araf Karsh Hamid
 
Why to Cloud Native
Karthik Gaekwad
 
How we can do Multi-Tenancy on Kubernetes
Opsta
 
Introduction to microservices
Paulo Gandra de Sousa
 
Kubernetes - A Comprehensive Overview
Bob Killen
 
Intro to Helm for Kubernetes
Carlos E. Salazar
 
Overview of Site Reliability Engineering (SRE) & best practices
Ashutosh Agarwal
 
Ansible
Rahul Bajaj
 
Site reliability engineering - Lightning Talk
Michae Blakeney
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
DevOpsDays Tel Aviv
 
What's an SRE at Criteo - Meetup SRE Paris
Clément Michaud
 
A Crash Course in Building Site Reliability
Acquia
 

Similar to SRE Demystified - 14 - SRE Practices overview (20)

PPTX
Site (Service) Reliability Engineering
Mark Underwood
 
PPTX
DevOps Torino Meetup - SRE Concepts
Rauno De Pasquale
 
PDF
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
PDF
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
ITSM Academy, Inc.
 
PDF
SRE Fundamentals: Understanding the Approach and Core Concepts
pallavibnovelvista
 
PPTX
SRE Training in Hyderabad | Site Reliability Engineering Training
ranjithvisualpath44
 
PDF
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
PDF
Site Reliability Engineering slide deck 101
ManikumarKothapalli1
 
PPTX
Site Reliability Engineering: Harnessing (and redefining) it for ITSM
Jon Stevens-Hall
 
PDF
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
Phil Johnson
 
PDF
Essential_Skills_of_a_Site_Reliability_E.pdf
robert mota
 
PDF
Sre summary
Yogesh Shah
 
PDF
SRE Roundtable with 4 DevOps Ambassadors
ITSM Academy, Inc.
 
PPTX
Rethinking Site Reliability Engineering for ITSM - SDI virtual event "New Way...
Jon Stevens-Hall
 
PDF
Upskill Yourself With GSDC Site Reliability Engineering Certification
gsdcouncil1
 
PPTX
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
NUS-ISS
 
PPTX
What is Site Reliability Engineering (SRE)
jeetendra mandal
 
PDF
S.R.E - create ultra-scalable and highly reliable systems
Ricardo Amaro
 
PDF
VS Live Las Vegas - When Down is not good enough - SRE on Azure
Rene Van Osnabrugge
 
PPTX
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
ShikhaSrivastava820471
 
Site (Service) Reliability Engineering
Mark Underwood
 
DevOps Torino Meetup - SRE Concepts
Rauno De Pasquale
 
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
ITSM Academy, Inc.
 
SRE Fundamentals: Understanding the Approach and Core Concepts
pallavibnovelvista
 
SRE Training in Hyderabad | Site Reliability Engineering Training
ranjithvisualpath44
 
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
Site Reliability Engineering slide deck 101
ManikumarKothapalli1
 
Site Reliability Engineering: Harnessing (and redefining) it for ITSM
Jon Stevens-Hall
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
Phil Johnson
 
Essential_Skills_of_a_Site_Reliability_E.pdf
robert mota
 
Sre summary
Yogesh Shah
 
SRE Roundtable with 4 DevOps Ambassadors
ITSM Academy, Inc.
 
Rethinking Site Reliability Engineering for ITSM - SDI virtual event "New Way...
Jon Stevens-Hall
 
Upskill Yourself With GSDC Site Reliability Engineering Certification
gsdcouncil1
 
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
NUS-ISS
 
What is Site Reliability Engineering (SRE)
jeetendra mandal
 
S.R.E - create ultra-scalable and highly reliable systems
Ricardo Amaro
 
VS Live Las Vegas - When Down is not good enough - SRE on Azure
Rene Van Osnabrugge
 
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
ShikhaSrivastava820471
 
Ad

More from Dr Ganesh Iyer (20)

PDF
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
Dr Ganesh Iyer
 
PDF
SRE Demystified - 13 - Docs that matter -2
Dr Ganesh Iyer
 
PDF
SRE Demystified - 12 - Docs that matter -1
Dr Ganesh Iyer
 
PDF
SRE Demystified - 01 - SLO SLI and SLA
Dr Ganesh Iyer
 
PDF
SRE Demystified - 11 - Release management-2
Dr Ganesh Iyer
 
PDF
SRE Demystified - 10 - Release management-1
Dr Ganesh Iyer
 
PDF
SRE Demystified - 09 - Simplicity
Dr Ganesh Iyer
 
PDF
SRE Demystified - 07 - Practical Alerting
Dr Ganesh Iyer
 
PDF
SRE Demystified - 06 - Distributed Monitoring
Dr Ganesh Iyer
 
PDF
SRE Demystified - 05 - Toil Elimination
Dr Ganesh Iyer
 
PDF
SRE Demystified - 04 - Engagement Model
Dr Ganesh Iyer
 
PDF
SRE Demystified - 03 - Choosing SLIs and SLOs
Dr Ganesh Iyer
 
PDF
Machine Learning for Statisticians - Introduction
Dr Ganesh Iyer
 
PDF
Making Decisions - A Game Theoretic approach
Dr Ganesh Iyer
 
PDF
Cloud and Industry4.0
Dr Ganesh Iyer
 
PDF
Game Theory and Engineering Applications
Dr Ganesh Iyer
 
PDF
Machine Learning and its Applications
Dr Ganesh Iyer
 
PDF
How to become a successful entrepreneur
Dr Ganesh Iyer
 
PDF
Dockers and kubernetes
Dr Ganesh Iyer
 
PDF
Containerization Principles Overview for app development and deployment
Dr Ganesh Iyer
 
SRE Demystified - 16 - NALSD - Non-Abstract Large System Design
Dr Ganesh Iyer
 
SRE Demystified - 13 - Docs that matter -2
Dr Ganesh Iyer
 
SRE Demystified - 12 - Docs that matter -1
Dr Ganesh Iyer
 
SRE Demystified - 01 - SLO SLI and SLA
Dr Ganesh Iyer
 
SRE Demystified - 11 - Release management-2
Dr Ganesh Iyer
 
SRE Demystified - 10 - Release management-1
Dr Ganesh Iyer
 
SRE Demystified - 09 - Simplicity
Dr Ganesh Iyer
 
SRE Demystified - 07 - Practical Alerting
Dr Ganesh Iyer
 
SRE Demystified - 06 - Distributed Monitoring
Dr Ganesh Iyer
 
SRE Demystified - 05 - Toil Elimination
Dr Ganesh Iyer
 
SRE Demystified - 04 - Engagement Model
Dr Ganesh Iyer
 
SRE Demystified - 03 - Choosing SLIs and SLOs
Dr Ganesh Iyer
 
Machine Learning for Statisticians - Introduction
Dr Ganesh Iyer
 
Making Decisions - A Game Theoretic approach
Dr Ganesh Iyer
 
Cloud and Industry4.0
Dr Ganesh Iyer
 
Game Theory and Engineering Applications
Dr Ganesh Iyer
 
Machine Learning and its Applications
Dr Ganesh Iyer
 
How to become a successful entrepreneur
Dr Ganesh Iyer
 
Dockers and kubernetes
Dr Ganesh Iyer
 
Containerization Principles Overview for app development and deployment
Dr Ganesh Iyer
 
Ad

Recently uploaded (20)

PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 

SRE Demystified - 14 - SRE Practices overview