SlideShare a Scribd company logo
October	24,	2017
Operations as a Service:
Because Failure Still Happens
Damon Edwards
@damonedwards
Damon
Edwards
Ops Improvement
DevOps Consulting
Ops Tools
Community
Let’s talk about Operations in the enterprise….
Ops
Operations is getting squeezed
The Operations Squeeze
“The Operations Squeeze”
Go faster! Be flexible! Lock it down!
Improved Quality
Shorter Time-to-Market
Fast Feedback
From Users
Availability Auditing
Security Compliance
Dev Ops
Ops
Operations is getting squeezed
The Operations Squeeze
“The Operations Squeeze”
Go faster! Be flexible! Lock it down!
Improved Quality
Shorter Time-to-Market
Fast Feedback
From Users
Availability Auditing
Security Compliance
Dev Ops
Ops
More errors
More delays
Less capacity
Less flexibility
Ops is Unplanned Work and Planned Work… by design!
+
Ops is Unplanned Work and Planned Work… by design!
+
Lots of Agile and
DevOps techniques
focused here
Ops is Unplanned Work and Planned Work… by design!
+
Lots of Agile and
DevOps techniques
focused here
Not so much here
Ops is Unplanned Work and Planned Work… by design!
+
Lots of Agile and
DevOps techniques
focused here
Not so much here
Let’s look at a company that is winning the
battle against the “Operations Squeeze”
Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens
Mark
Maun
Jody
Mulkey
Justin
Dean
90% Reduction in MTTR
50% Reduction in escalations
55% Reduction of overall support costs
90% Reduction in MTTR
50% Reduction in escalations
55% Reduction of overall support costs
Better, Faster,
and Cheaper!
How did they do that?
But first…
Let’s look at the principles behind the improvement …
Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
Two prevailing models of operations support
Running
Service
“You build it. They run it.” “You build it. You run it.”
Development
Team
Operations
Team
Dev Ops
Integrated Delivery Team
Running
Service
“two-pizza team”
“You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
It’s the NOC…
Talk them through: health checks,
reviewing log files, and process of
diagnosing and recovering the system.
Same as you did for dev teams 2
months ago, QA teams last month,
Ops during deploy last week, etc.
“You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
“You build it. They run it.” (aka… the way it always was)
It’s 2am ….
It’s 2pm ….
It’s Ops…
“Will your applications be affected if
we take down EU-West?”
“Is it ok if we change these firewall
rules?”
“We are getting customer complaints
about performance. Are you sure you
didn’t change something?”.
“You build it. They run it.” (aka… the way it always was)
Running
Service
Development
Team
Operations
Team
“You build it. They run it.” (aka… the way it always was)
Running
Service
Development
Team
Operations
Team
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“You build it. You run it.”
Dev Ops
Integrated Delivery Team
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
Running
Service
?
Incident!!
Incident!!
What would happen if…
New feature!!
New feature!!
New API!!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
Running
Service
Add this to your
responsibilities!
“two-pizza teams”?
Just change how
business is structured,
funded, and operated.
Ideally we can find a way to…
Have the labor scaling benefits of “you build it, they run it”
without
the frequent escalations
the bad handoffs
Ideally we can find a way to…
Have the labor scaling benefits of “you build it, they run it”
without
the frequent escalations
the bad handoffs
Ideally we can find a way to…
Have the responsiveness/control of “you build it, you run it”
without
the scaling limitations
What gets in the way?
Silos tend to ruin everything
Backlog Context
I need X
Backlog
I do X
Requests
for X
Silo A
Priorities
Context
Priorities
Silo B
Tools Tools
Ticket-Driven Request Queues Are Often a Sign of Silos
Team A
(Dev)
Team B
(Ops)
Ticket
System
??
Ticket-Driven Request Queues Are Often a Sign of Silos
Team A
(Dev)
Team B
(Ops)
Ticket
System
??
Silo Builder
Ticket-Driven Request Queues Are Often a Sign of Silos
Team A
(Dev)
Team B
(Ops)
Ticket
System
??
Silo Builder Snowflake Maker
Silos + Rapid Tool Evolution = Islands of Automation
Puppet Chef
Shell Scripts
Data ETL
PowershellScripts
Network
Management
Monitoring
Ansible
Legacy
Datacenter
Automation
ContainerManagement
SQL
Tools
NewTools
New
Tools
Complex
System
Working in a complex system2
Service A
Service B
Service B v2
Service C
Service D
Service
E
Network
Network
Firewall
API
API
APIData
Data
ESB
API
Firewall
Firewall
Complex
System
Complex
System
interacting with a
Working in a complex system2
Service A
Service B
Service B v2
Service C
Service D
Service
E
Network
Network
Firewall
API
API
APIData
Data
ESB
API
Firewall
Firewall
Silos are everywhere
Islands of automation
Its a complex system2
Again: What gets in the way?
So how do we respond quicker, yet stay under control?
Empower those closest to the issue
or escalate escalate
1° 2° 3°
escalate
4°
Empower those closest to the issue
or escalate escalate
1° 2° 3°
escalate
4°
Push the ability to take action this direction
Improve flow by implementing Operations as a Service
Team A
(Dev)
Team B
(Ops)Ticket
System
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Actual Exceptions
Execute
On Demand
Change how you think about automated procedures…
Automated procedures are comprised of three parts
Definition of the automated procedure
Execution of the automated procedure
Governance of the automated procedure
Define
Execute
Govern
Automated procedures are comprised of three parts
Definition of the automated procedure
Execution of the automated procedure
Governance of the automated procedure
Define
Execute
Govern
(security, oversight, compliance, etc.)
Traditional Ops Silo
Define
Execute
Govern
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
Rigid Self-Service
Define
Execute
Govern
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
Define
Execute
Govern
Execute
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
Rigid Self-Service (limited)
High-Velocity Handoffs
Define
Govern
Execute
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
Self-Service Operations
Define
Govern
Execute
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
Self-Service Operations
Define
Govern
Execute
Govern
“Consumers of Ops”
(Dev, QA, Release, NOC, Security, etc.)
Ops
fdfd
Operations as a Service
Operations
as a
Service
ED G
Team B
(Ops)
Vet
Procedures
Define
Policies
Execute
On Demand
Team A
(Dev)
Define
Procedures
Execute
On Demand
fdfd
Operations as a Service
Split definition, execution, and governance and
move to where most effective use of labor
Operations
as a
Service
ED G
Team B
(Ops)
Vet
Procedures
Define
Policies
Execute
On Demand
Team A
(Dev)
Define
Procedures
Execute
On Demand
Again: How do we respond quicker, yet stay under control?
Empower those closest to the issue
Improve flow by implementing
Operations as a Service
Rundeck: Open Source Platform For Operations as a Service
#! ! "# $
Scripts APIs Tools Cloud VMs Containers
Orchestration &
Scheduling of Workflows
Collect and
Process Output
Infrastructure
details and state
from multiple
sources
Config.
Man.
CMDB
Monitor.
Metrics
Cloud
Corp
Directory
Authentication
and roles
ITSM Tickets, work
status, approvals
>_
Create workflows ● Define ACL policies ● Execute workflows
Web GUI API CLI
Common implementation pattern
for Operations as a Service…
Step 1: Establish a Secure Ops Hub
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
Step 2: Establish a SDLC for Ops Procedures
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
Step 3: Connect with Enterprise Management Systems
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Software
Supply Chain
Ops integrate
with artifact
flow
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
Step 4: Make Compliance Really Happy
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Software
Supply Chain
Ops integrate
with artifact
flow
Who reviewed it? Who ran it? When? Where? Approval trail?
Who created the procedure?
Who created the policy?
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
Everybody wins….
Improve incident response time and reduce escalations
Finish
Deliverables
Interrupt
Interrupt
? ?
?
?
Interrupt
X
"Too busy"
"We're late!"
Start
Deliverables
Fromcurrentproduction
Finish
Deliverables
Interrupt
? ?
?
?
Start
Deliverables
Fromcurrentproduction
"This looks
important"Interrupt
✔
Delivery Team (L2, L3) Delivery Team (L2, L3)
NOC
NOC
NOC
NOC
NOC
NOC
NOC
NOC
Previously delivered
Rundeck Jobs
Old Model New Model
Improve incident response time and reduce escalations
Finish
Deliverables
Interrupt
Interrupt
? ?
?
?
Interrupt
X
"Too busy"
"We're late!"
Start
Deliverables
Fromcurrentproduction
Finish
Deliverables
Interrupt
? ?
?
?
Start
Deliverables
Fromcurrentproduction
"This looks
important"Interrupt
✔
Delivery Team (L2, L3) Delivery Team (L2, L3)
NOC
NOC
NOC
NOC
NOC
NOC
NOC
NOC
Previously delivered
Rundeck Jobs
Old Model New Model
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Tightens feedback loops
Reduce delays that otherwise hurt the business
RevenueperWeek
Time
COST OF DELAY Actual Revenue
Opportunity Ready
Enables Ops managers to focus on creating value
Old mindset:
Protect capacity
Say “no”
Manager
Enables Ops managers to focus on creating value
Old mindset:
Protect capacity
Say “no”
Manager
New mindset:
Scaling OaaS
Get more users
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Calculating the ROI for Operations as a Service
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Calculating the ROI for Operations as a Service
ROI inside Ops
Decrease in time to respond to incidents
Decrease in errors and rework
Increase in operational support tasks delegated
Increase in team capacity
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Calculating the ROI for Operations as a Service
ROI inside Ops
Decrease in time to respond to incidents
Decrease in errors and rework
Increase in operational support tasks delegated
Increase in team capacity
ROI outside Ops
Decrease in number of escalations
Decrease in time spent waiting and rework loops
Decrease in issues due to problematic handoffs
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Calculating the ROI for Operations as a Service
ROI inside Ops
Decrease in time to respond to incidents
Decrease in errors and rework
Increase in operational support tasks delegated
Increase in team capacity
ROI outside Ops
Decrease in number of escalations
Decrease in time spent waiting and rework loops
Decrease in issues due to problematic handoffs
ROI to Business
Decrease in total cost of operations and support
Decrease in time-to-market, cycle-time, and schedule slippage
Team A
(Dev)
Team B
(Ops)
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Execute
On Demand
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
• Empowered the NOC team to be “operators” again
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
• Empowered the NOC team to be “operators” again
• Empowered developers with limited self-service operations
Back to our story…
Mark
Maun
Jody
Mulkey
Justin
Dean
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
Ticketmaster’s “Support at the Edge” model
• Automated Ops procedures written/vetted by the delivery teams
• Ops remained in full control of what can run and security policy
• Empowered support teams with self-service ops tasks
• Empowered the NOC team to be “operators” again
• Empowered developers with limited self-service operations
Better for the business and a better way to work
90% Reduction in MTTR
50% Reduction in escalations
55% Reduction of overall support costs
Recap
Move definition, execution,
and governance to where
best use of labor
Understand the
pressures on Ops
Make explicit investment in
process and tooling
Operations as a Service: Reshaping IT Operations to Solve Today’s Challenges 4
D
evOps and Digital Transformations are
driving an unprecedented increase in
the pace and volume of daily change.
Who generally finds this to be welcome news?
Development and Product teams. Who has reasons
to be alarmed at the problems and challenges this
might bring? Operations.
Operations organizations in today’s enterprises
are finding themselves squeezed between two
unrelenting forces. On one side there are the
business-driven demands of DevOps and Digital
Transformation (“Go faster! Open things up!). On
the other side there are the demands to maximize
security and stability (“Don’t be the next hack! Don’t
be the next outage! Lock things down!”). And there, in
the middle, is an already over-burdened Operations
organization doing their best to avoid being squeezed
beyond the breaking point.
Operations has reached an inflection point. To deliver
what the business demands, Operations must find
a way to provide increasing levels of organizational
responsiveness and throughput — all while “locking
things down” to sufficiently meet today’s risk profiles.
A lot is riding on how Operations responds to this
challenge. A failure here is not just a localized IT
failure. A failure will undermine a business’s ability
to operate. Failing to solve this will turn into a
competitive disadvantage for the business.
On the flip side, this challenge also presents a great
opportunity. Operations can take this business
mandate and use it to reimagine how both planned
and unplanned work is handled. This is a chance to
improve how Operations both serves the broader
business and improves the day-to-day lives of
Operations professionals.
The Operations Squeeze
Introduction
“The Operations Squeeze”
Go faster! Be flexible! Lock it down!
Improved Quality
Shorter Time-to-Market
Fast Feedback
From Users
Availability Auditing
Security Compliance
Dev Ops
Ops
Operations is a lot more
than deployment
Team A
(Dev)
Team B
(Ops)
Ticket
System
??
Beware of silos
Use the Operations as a
Service design pattern
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Software
Supply Chain
Ops integrate
with artifact
flow
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
Let’s talk…
@damonedwards
damon@rundeck.com
October	24,	2017
Session	Title
Your	Name	
Your	Title	
Your	Company	
Your	@TwitterHandle
October	24,	2017
Session	Title
Your	Name	
Your	Title	
Your	Company	
Your	@TwitterHandle

More Related Content

PDF
The "Ops" Side of DevSecOps
Rundeck
 
PDF
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
PDF
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Rundeck
 
PDF
Failure Happens: Improving Incident Response In Enterprises
Rundeck
 
PDF
Self-Service Operations: Because Ops Still Happens
Rundeck
 
PDF
Modern Operations: Solving DevOps’ Last Mile Problem
Rundeck
 
PDF
Ops Happens: Improving Incident Response Using DevOps and SRE Practices
Rundeck
 
PDF
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Rundeck
 
The "Ops" Side of DevSecOps
Rundeck
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Rundeck
 
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Rundeck
 
Failure Happens: Improving Incident Response In Enterprises
Rundeck
 
Self-Service Operations: Because Ops Still Happens
Rundeck
 
Modern Operations: Solving DevOps’ Last Mile Problem
Rundeck
 
Ops Happens: Improving Incident Response Using DevOps and SRE Practices
Rundeck
 
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
Rundeck
 

What's hot (20)

PDF
Incident Management in the Age of DevOps and SRE
Rundeck
 
PDF
SRE for Everyone: Making Tomorrow Better Than Today
Rundeck
 
PDF
SysAdmin to SRE: Solving the Last Mile Problem
Rundeck
 
PDF
Incident Management in the Age of DevOps and SRE
Rundeck
 
PDF
Clearing the Way For SRE In the Enterprise
Rundeck
 
PDF
Operations: The Last Mile
Rundeck
 
PDF
The Last Mile Continued: Incident Management
Rundeck
 
PDF
Tickets Make Operations Work Unnecessarily Miserable
Rundeck
 
PDF
Incident Management in the Age of DevOps and SRE
Rundeck
 
PDF
Making Tomorrow Better than Today - Unlocking the Full Potential of Operations
Rundeck
 
PDF
Operations: The Last Mile
Rundeck
 
PDF
SRE Lessons for the Enterprise
Rundeck
 
PDF
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
Rundeck
 
PDF
SRE From Scratch
Grier Johnson
 
PDF
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Rundeck
 
PDF
Innovation and Architecture
Adrian Cockcroft
 
PDF
DOES16 London - Better Faster Cheaper .. How?
John Willis
 
PDF
Operations: The Last Mile Problem For DevOps
Rundeck
 
PDF
All daydevops 2016 - Turning Human Capital into High Performance Organizati...
John Willis
 
PDF
8 Things That Make Continuous Delivery Go Nuts
Eduards Sizovs
 
Incident Management in the Age of DevOps and SRE
Rundeck
 
SRE for Everyone: Making Tomorrow Better Than Today
Rundeck
 
SysAdmin to SRE: Solving the Last Mile Problem
Rundeck
 
Incident Management in the Age of DevOps and SRE
Rundeck
 
Clearing the Way For SRE In the Enterprise
Rundeck
 
Operations: The Last Mile
Rundeck
 
The Last Mile Continued: Incident Management
Rundeck
 
Tickets Make Operations Work Unnecessarily Miserable
Rundeck
 
Incident Management in the Age of DevOps and SRE
Rundeck
 
Making Tomorrow Better than Today - Unlocking the Full Potential of Operations
Rundeck
 
Operations: The Last Mile
Rundeck
 
SRE Lessons for the Enterprise
Rundeck
 
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today
Rundeck
 
SRE From Scratch
Grier Johnson
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Rundeck
 
Innovation and Architecture
Adrian Cockcroft
 
DOES16 London - Better Faster Cheaper .. How?
John Willis
 
Operations: The Last Mile Problem For DevOps
Rundeck
 
All daydevops 2016 - Turning Human Capital into High Performance Organizati...
John Willis
 
8 Things That Make Continuous Delivery Go Nuts
Eduards Sizovs
 
Ad

Similar to Operations as a Service: Because Failure Still Happens (20)

PDF
You Build It, But How Are You Going to Run It?
Rundeck
 
PDF
Ops Happens: DevOps Beyond Deployment - Damon Edwards
SeniorStoryteller
 
PPT
Continuous Deployment
Brian Henerey
 
ODP
Dev ops
Eslam El Husseiny
 
PPTX
Kanban Development And The Paradigm Of Flow
Alisson Vale
 
PDF
DevOps in the Amazon Warehouse - Shawn Gandhi
TriNimbus
 
PDF
From Monoliths to Microservices at Realestate.com.au
evanbottcher
 
PDF
Demystifying DevOps
Dr. Tathagat Varma
 
PDF
How Cerner Corporation Delivers End-to-End Workflow Visibility to Increase Cr...
AppDynamics
 
PDF
Rails Operations - Lessons Learned
Josh Nichols
 
PDF
DEVNET-2015 DevOps In Depth - Damon Edwards on DevOps Kaizen: Building an Ent...
Cisco DevNet
 
PDF
Why DevOps Needs to Embrace Distributed Tracing
DevOps.com
 
PDF
Cloud-Native Workshop - Santa Monica
VMware Tanzu
 
PPTX
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
KenAtIndeed
 
PPTX
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
dev2ops
 
PPTX
DOES15 - Damon Edwards - DevOps Kaizen Practical Steps to Start & Sustain a T...
Gene Kim
 
PDF
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Matt Tesauro
 
PDF
DevOps Kaizen: Find and Fix What is Really Behind Your Problems
dev2ops
 
PDF
Ops Happen: Improve Security Without Getting in the Way
SeniorStoryteller
 
PDF
DevOps - Applying Lean & Agile Principles to Operations & More
Chris Edwards
 
You Build It, But How Are You Going to Run It?
Rundeck
 
Ops Happens: DevOps Beyond Deployment - Damon Edwards
SeniorStoryteller
 
Continuous Deployment
Brian Henerey
 
Kanban Development And The Paradigm Of Flow
Alisson Vale
 
DevOps in the Amazon Warehouse - Shawn Gandhi
TriNimbus
 
From Monoliths to Microservices at Realestate.com.au
evanbottcher
 
Demystifying DevOps
Dr. Tathagat Varma
 
How Cerner Corporation Delivers End-to-End Workflow Visibility to Increase Cr...
AppDynamics
 
Rails Operations - Lessons Learned
Josh Nichols
 
DEVNET-2015 DevOps In Depth - Damon Edwards on DevOps Kaizen: Building an Ent...
Cisco DevNet
 
Why DevOps Needs to Embrace Distributed Tracing
DevOps.com
 
Cloud-Native Workshop - Santa Monica
VMware Tanzu
 
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
KenAtIndeed
 
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
dev2ops
 
DOES15 - Damon Edwards - DevOps Kaizen Practical Steps to Start & Sustain a T...
Gene Kim
 
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Matt Tesauro
 
DevOps Kaizen: Find and Fix What is Really Behind Your Problems
dev2ops
 
Ops Happen: Improve Security Without Getting in the Way
SeniorStoryteller
 
DevOps - Applying Lean & Agile Principles to Operations & More
Chris Edwards
 
Ad

More from Rundeck (20)

PDF
Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck
 
PPTX
Introducing PagerDuty Process Automation
Rundeck
 
PDF
How to Build a Custom Plugin in Rundeck
Rundeck
 
PDF
Lunch and learn: Getting started with Rundeck & Ansible
Rundeck
 
PDF
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Rundeck
 
PDF
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck
 
PPTX
Mastering Secrets Management in Rundeck
Rundeck
 
PDF
What's New in Rundeck 3.4
Rundeck
 
PDF
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Rundeck
 
PDF
Super-Charge Your Site Reliability Practices with Runbook Automation
Rundeck
 
PPTX
Introduction to Rundeck
Rundeck
 
PPTX
Automated Remediation with Rundeck + Sensu
Rundeck
 
PDF
Modernizing Incident Response
Rundeck
 
PDF
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Rundeck
 
PDF
Datadog + Rundeck at DASH 2020
Rundeck
 
PDF
Rundeck Overview
Rundeck
 
PDF
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Rundeck
 
PPTX
Advanced Cluster Settings
Rundeck
 
PDF
Maximizing Your Rundeck Migration
Rundeck
 
PDF
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Rundeck
 
Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck
 
Introducing PagerDuty Process Automation
Rundeck
 
How to Build a Custom Plugin in Rundeck
Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Rundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck
 
Mastering Secrets Management in Rundeck
Rundeck
 
What's New in Rundeck 3.4
Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Rundeck
 
Introduction to Rundeck
Rundeck
 
Automated Remediation with Rundeck + Sensu
Rundeck
 
Modernizing Incident Response
Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Rundeck
 
Datadog + Rundeck at DASH 2020
Rundeck
 
Rundeck Overview
Rundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Rundeck
 
Advanced Cluster Settings
Rundeck
 
Maximizing Your Rundeck Migration
Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Rundeck
 

Recently uploaded (20)

PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 

Operations as a Service: Because Failure Still Happens

  • 1. October 24, 2017 Operations as a Service: Because Failure Still Happens Damon Edwards @damonedwards
  • 3. Let’s talk about Operations in the enterprise…. Ops
  • 4. Operations is getting squeezed The Operations Squeeze “The Operations Squeeze” Go faster! Be flexible! Lock it down! Improved Quality Shorter Time-to-Market Fast Feedback From Users Availability Auditing Security Compliance Dev Ops Ops
  • 5. Operations is getting squeezed The Operations Squeeze “The Operations Squeeze” Go faster! Be flexible! Lock it down! Improved Quality Shorter Time-to-Market Fast Feedback From Users Availability Auditing Security Compliance Dev Ops Ops More errors More delays Less capacity Less flexibility
  • 6. Ops is Unplanned Work and Planned Work… by design! +
  • 7. Ops is Unplanned Work and Planned Work… by design! + Lots of Agile and DevOps techniques focused here
  • 8. Ops is Unplanned Work and Planned Work… by design! + Lots of Agile and DevOps techniques focused here Not so much here
  • 9. Ops is Unplanned Work and Planned Work… by design! + Lots of Agile and DevOps techniques focused here Not so much here
  • 10. Let’s look at a company that is winning the battle against the “Operations Squeeze”
  • 14. 90% Reduction in MTTR 50% Reduction in escalations 55% Reduction of overall support costs
  • 15. 90% Reduction in MTTR 50% Reduction in escalations 55% Reduction of overall support costs Better, Faster, and Cheaper!
  • 16. How did they do that?
  • 18. Let’s look at the principles behind the improvement …
  • 19. Two prevailing models of operations support Running Service “You build it. They run it.” “You build it. You run it.” Development Team Operations Team Dev Ops Integrated Delivery Team Running Service
  • 20. Two prevailing models of operations support Running Service “You build it. They run it.” “You build it. You run it.” Development Team Operations Team Dev Ops Integrated Delivery Team Running Service
  • 21. Two prevailing models of operations support Running Service “You build it. They run it.” “You build it. You run it.” Development Team Operations Team Dev Ops Integrated Delivery Team Running Service “two-pizza team”
  • 22. “You build it. They run it.” (aka… the way it always was) It’s 2am …. It’s 2pm …. It’s the NOC… Talk them through: health checks, reviewing log files, and process of diagnosing and recovering the system. Same as you did for dev teams 2 months ago, QA teams last month, Ops during deploy last week, etc.
  • 23. “You build it. They run it.” (aka… the way it always was) It’s 2am …. It’s 2pm ….
  • 24. “You build it. They run it.” (aka… the way it always was) It’s 2am …. It’s 2pm …. It’s Ops… “Will your applications be affected if we take down EU-West?” “Is it ok if we change these firewall rules?” “We are getting customer complaints about performance. Are you sure you didn’t change something?”.
  • 25. “You build it. They run it.” (aka… the way it always was) Running Service Development Team Operations Team
  • 26. “You build it. They run it.” (aka… the way it always was) Running Service Development Team Operations Team
  • 27. “You build it. You run it.” Dev Ops Integrated Delivery Team
  • 28. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!!
  • 29. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities!
  • 30. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities!
  • 31. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities!
  • 32. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities!
  • 33. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities!
  • 34. “You build it. You run it.” Dev Ops Integrated Delivery Team Running Service Running Service Running Service Running Service Running Service Running Service ? Incident!! Incident!! What would happen if… New feature!! New feature!! New API!! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! Running Service Add this to your responsibilities! “two-pizza teams”? Just change how business is structured, funded, and operated.
  • 35. Ideally we can find a way to…
  • 36. Have the labor scaling benefits of “you build it, they run it” without the frequent escalations the bad handoffs Ideally we can find a way to…
  • 37. Have the labor scaling benefits of “you build it, they run it” without the frequent escalations the bad handoffs Ideally we can find a way to… Have the responsiveness/control of “you build it, you run it” without the scaling limitations
  • 38. What gets in the way?
  • 39. Silos tend to ruin everything Backlog Context I need X Backlog I do X Requests for X Silo A Priorities Context Priorities Silo B Tools Tools
  • 40. Ticket-Driven Request Queues Are Often a Sign of Silos Team A (Dev) Team B (Ops) Ticket System ??
  • 41. Ticket-Driven Request Queues Are Often a Sign of Silos Team A (Dev) Team B (Ops) Ticket System ?? Silo Builder
  • 42. Ticket-Driven Request Queues Are Often a Sign of Silos Team A (Dev) Team B (Ops) Ticket System ?? Silo Builder Snowflake Maker
  • 43. Silos + Rapid Tool Evolution = Islands of Automation Puppet Chef Shell Scripts Data ETL PowershellScripts Network Management Monitoring Ansible Legacy Datacenter Automation ContainerManagement SQL Tools NewTools New Tools
  • 44. Complex System Working in a complex system2 Service A Service B Service B v2 Service C Service D Service E Network Network Firewall API API APIData Data ESB API Firewall Firewall
  • 45. Complex System Complex System interacting with a Working in a complex system2 Service A Service B Service B v2 Service C Service D Service E Network Network Firewall API API APIData Data ESB API Firewall Firewall
  • 46. Silos are everywhere Islands of automation Its a complex system2 Again: What gets in the way?
  • 47. So how do we respond quicker, yet stay under control?
  • 48. Empower those closest to the issue or escalate escalate 1° 2° 3° escalate 4°
  • 49. Empower those closest to the issue or escalate escalate 1° 2° 3° escalate 4° Push the ability to take action this direction
  • 50. Improve flow by implementing Operations as a Service Team A (Dev) Team B (Ops)Ticket System Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Actual Exceptions Execute On Demand
  • 51. Change how you think about automated procedures…
  • 52. Automated procedures are comprised of three parts Definition of the automated procedure Execution of the automated procedure Governance of the automated procedure Define Execute Govern
  • 53. Automated procedures are comprised of three parts Definition of the automated procedure Execution of the automated procedure Governance of the automated procedure Define Execute Govern (security, oversight, compliance, etc.)
  • 54. Traditional Ops Silo Define Execute Govern “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops
  • 55. Rigid Self-Service Define Execute Govern “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops
  • 56. Define Execute Govern Execute “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops Rigid Self-Service (limited)
  • 57. High-Velocity Handoffs Define Govern Execute “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops
  • 58. Self-Service Operations Define Govern Execute “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops
  • 59. Self-Service Operations Define Govern Execute Govern “Consumers of Ops” (Dev, QA, Release, NOC, Security, etc.) Ops
  • 60. fdfd Operations as a Service Operations as a Service ED G Team B (Ops) Vet Procedures Define Policies Execute On Demand Team A (Dev) Define Procedures Execute On Demand
  • 61. fdfd Operations as a Service Split definition, execution, and governance and move to where most effective use of labor Operations as a Service ED G Team B (Ops) Vet Procedures Define Policies Execute On Demand Team A (Dev) Define Procedures Execute On Demand
  • 62. Again: How do we respond quicker, yet stay under control? Empower those closest to the issue Improve flow by implementing Operations as a Service
  • 63. Rundeck: Open Source Platform For Operations as a Service #! ! "# $ Scripts APIs Tools Cloud VMs Containers Orchestration & Scheduling of Workflows Collect and Process Output Infrastructure details and state from multiple sources Config. Man. CMDB Monitor. Metrics Cloud Corp Directory Authentication and roles ITSM Tickets, work status, approvals >_ Create workflows ● Define ACL policies ● Execute workflows Web GUI API CLI
  • 64. Common implementation pattern for Operations as a Service…
  • 65. Step 1: Establish a Secure Ops Hub Operations as a Service Engineers get visibility and controlled self-service Secrets Ops Procedures “Status” “Firewall Change” "Restart" deny allow Identity Audit Logs Infrastructure view Service health System metrics Ops Support use for remediation procedures Inventory and Health Execute + Monitoring Tools Security and Ops manages access, configuration, and compliance
  • 66. Step 2: Establish a SDLC for Ops Procedures Operations as a Service Engineers get visibility and controlled self-service Secrets Ops Procedures “Status” “Firewall Change” "Restart" deny allow Identity Audit Logs Infrastructure view Service health System metrics Ops Support use for remediation procedures Inventory and Health Execute Source Code Repo if (($state==wait)) then kill -9 $PID fi Change Product Engineers produce automated procedures and health checks. RISKY Automated Procedures and Health Checks FIX Code review + Monitoring Tools Security and Ops manages access, configuration, and compliance
  • 67. Step 3: Connect with Enterprise Management Systems Service Desk CustomersOps Support get visibility and audit trail updated by support tools Service Ticket Execute Software Supply Chain Ops integrate with artifact flow Operations as a Service Engineers get visibility and controlled self-service Secrets Ops Procedures “Status” “Firewall Change” "Restart" deny allow Identity Audit Logs Infrastructure view Service health System metrics Ops Support use for remediation procedures Inventory and Health Source Code Repo if (($state==wait)) then kill -9 $PID fi Change Product Engineers produce automated procedures and health checks. RISKY Automated Procedures and Health Checks FIX Code review + Monitoring Tools Security and Ops manages access, configuration, and compliance
  • 68. Step 4: Make Compliance Really Happy Service Desk CustomersOps Support get visibility and audit trail updated by support tools Service Ticket Execute Software Supply Chain Ops integrate with artifact flow Who reviewed it? Who ran it? When? Where? Approval trail? Who created the procedure? Who created the policy? Operations as a Service Engineers get visibility and controlled self-service Secrets Ops Procedures “Status” “Firewall Change” "Restart" deny allow Identity Audit Logs Infrastructure view Service health System metrics Ops Support use for remediation procedures Inventory and Health Source Code Repo if (($state==wait)) then kill -9 $PID fi Change Product Engineers produce automated procedures and health checks. RISKY Automated Procedures and Health Checks FIX Code review + Monitoring Tools Security and Ops manages access, configuration, and compliance
  • 70. Improve incident response time and reduce escalations Finish Deliverables Interrupt Interrupt ? ? ? ? Interrupt X "Too busy" "We're late!" Start Deliverables Fromcurrentproduction Finish Deliverables Interrupt ? ? ? ? Start Deliverables Fromcurrentproduction "This looks important"Interrupt ✔ Delivery Team (L2, L3) Delivery Team (L2, L3) NOC NOC NOC NOC NOC NOC NOC NOC Previously delivered Rundeck Jobs Old Model New Model
  • 71. Improve incident response time and reduce escalations Finish Deliverables Interrupt Interrupt ? ? ? ? Interrupt X "Too busy" "We're late!" Start Deliverables Fromcurrentproduction Finish Deliverables Interrupt ? ? ? ? Start Deliverables Fromcurrentproduction "This looks important"Interrupt ✔ Delivery Team (L2, L3) Delivery Team (L2, L3) NOC NOC NOC NOC NOC NOC NOC NOC Previously delivered Rundeck Jobs Old Model New Model
  • 72. Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand Tightens feedback loops
  • 73. Reduce delays that otherwise hurt the business RevenueperWeek Time COST OF DELAY Actual Revenue Opportunity Ready
  • 74. Enables Ops managers to focus on creating value Old mindset: Protect capacity Say “no” Manager
  • 75. Enables Ops managers to focus on creating value Old mindset: Protect capacity Say “no” Manager New mindset: Scaling OaaS Get more users Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand
  • 76. Calculating the ROI for Operations as a Service Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand
  • 77. Calculating the ROI for Operations as a Service ROI inside Ops Decrease in time to respond to incidents Decrease in errors and rework Increase in operational support tasks delegated Increase in team capacity Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand
  • 78. Calculating the ROI for Operations as a Service ROI inside Ops Decrease in time to respond to incidents Decrease in errors and rework Increase in operational support tasks delegated Increase in team capacity ROI outside Ops Decrease in number of escalations Decrease in time spent waiting and rework loops Decrease in issues due to problematic handoffs Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand
  • 79. Calculating the ROI for Operations as a Service ROI inside Ops Decrease in time to respond to incidents Decrease in errors and rework Increase in operational support tasks delegated Increase in team capacity ROI outside Ops Decrease in number of escalations Decrease in time spent waiting and rework loops Decrease in issues due to problematic handoffs ROI to Business Decrease in total cost of operations and support Decrease in time-to-market, cycle-time, and schedule slippage Team A (Dev) Team B (Ops) Operations as a Service Execute On Demand Define Procedures Vet Procedures Define Policies Execute On Demand
  • 80. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model
  • 81. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams
  • 82. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams • Ops remained in full control of what can run and security policy
  • 83. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams • Ops remained in full control of what can run and security policy • Empowered support teams with self-service ops tasks
  • 84. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams • Ops remained in full control of what can run and security policy • Empowered support teams with self-service ops tasks • Empowered the NOC team to be “operators” again
  • 85. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams • Ops remained in full control of what can run and security policy • Empowered support teams with self-service ops tasks • Empowered the NOC team to be “operators” again • Empowered developers with limited self-service operations
  • 86. Back to our story… Mark Maun Jody Mulkey Justin Dean Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ http://rundeck.org/stories/mark_maun.html Ticketmaster’s “Support at the Edge” model • Automated Ops procedures written/vetted by the delivery teams • Ops remained in full control of what can run and security policy • Empowered support teams with self-service ops tasks • Empowered the NOC team to be “operators” again • Empowered developers with limited self-service operations
  • 87. Better for the business and a better way to work 90% Reduction in MTTR 50% Reduction in escalations 55% Reduction of overall support costs
  • 88. Recap Move definition, execution, and governance to where best use of labor Understand the pressures on Ops Make explicit investment in process and tooling Operations as a Service: Reshaping IT Operations to Solve Today’s Challenges 4 D evOps and Digital Transformations are driving an unprecedented increase in the pace and volume of daily change. Who generally finds this to be welcome news? Development and Product teams. Who has reasons to be alarmed at the problems and challenges this might bring? Operations. Operations organizations in today’s enterprises are finding themselves squeezed between two unrelenting forces. On one side there are the business-driven demands of DevOps and Digital Transformation (“Go faster! Open things up!). On the other side there are the demands to maximize security and stability (“Don’t be the next hack! Don’t be the next outage! Lock things down!”). And there, in the middle, is an already over-burdened Operations organization doing their best to avoid being squeezed beyond the breaking point. Operations has reached an inflection point. To deliver what the business demands, Operations must find a way to provide increasing levels of organizational responsiveness and throughput — all while “locking things down” to sufficiently meet today’s risk profiles. A lot is riding on how Operations responds to this challenge. A failure here is not just a localized IT failure. A failure will undermine a business’s ability to operate. Failing to solve this will turn into a competitive disadvantage for the business. On the flip side, this challenge also presents a great opportunity. Operations can take this business mandate and use it to reimagine how both planned and unplanned work is handled. This is a chance to improve how Operations both serves the broader business and improves the day-to-day lives of Operations professionals. The Operations Squeeze Introduction “The Operations Squeeze” Go faster! Be flexible! Lock it down! Improved Quality Shorter Time-to-Market Fast Feedback From Users Availability Auditing Security Compliance Dev Ops Ops Operations is a lot more than deployment Team A (Dev) Team B (Ops) Ticket System ?? Beware of silos Use the Operations as a Service design pattern Service Desk CustomersOps Support get visibility and audit trail updated by support tools Service Ticket Execute Software Supply Chain Ops integrate with artifact flow Operations as a Service Engineers get visibility and controlled self-service Secrets Ops Procedures “Status” “Firewall Change” "Restart" deny allow Identity Audit Logs Infrastructure view Service health System metrics Ops Support use for remediation procedures Inventory and Health Source Code Repo if (($state==wait)) then kill -9 $PID fi Change Product Engineers produce automated procedures and health checks. RISKY Automated Procedures and Health Checks FIX Code review + Monitoring Tools Security and Ops manages access, configuration, and compliance