SlideShare a Scribd company logo
Monitoring and Automation
From developer to devops
Quick questions
Quick questions
● Who here is doing operations?
● Who develops applications?
● Who develops infrastructure software?
● Who is doing all above?
● Who manages more than 10 servers?
● Who manages more than 100?
● 1000?
Where do we look?
Why do we measure?
● Optimize hardware usage
● Locate performance bottlenecks
● Identify anomalous behaviour
● Understand our own operational characteristics
○ How long do we normally take to add a new type of server?
○ What should we automate?
● Find out the costs of a given operation
○ If your operation takes 3 minutes of a server time, how much did we pay for it?
● Metering and billing
● Better understand the user
● Plan for future versions
Where do we look?
CPU counters
Revenue
Profit
Strategies Host OSGuest OSPlatformApplicationInternal processes
Where do we look?
CPU counters
Revenue
Profit
Strategies Host OSGuest OSPlatformApplicationInternal processes
OperationslandDeveloperland
Where do we look?
CPU counters
Revenue
Profit
Strategies Host OSGuest OSPlatformApplicationInternal processes
OperationslandDeveloperland
DevOpsland (according to a very lax definition)
Where do we look?
CPU counters
Revenue
Profit
Strategies Host OSGuest OSPlatformApplicationInternal processes
OperationslandDeveloperland
DevOpsland
Billing, HW
usage Performance
What to
automate?
How do we look?
● Time resolution (and retention)
● GDPR (don't be evil, or keep what you don't need)
● Push vs. Pull
○ Register clients or let them register themselves
● Is UDP your friend?
○ Fast, cheap and unreliable
○ If you need security, you need to build it
○ You should probably use it in combination with TCP (a subsampled guaranteed channel)
● Standards
○ Prometheus is your friend
● Visualization
○ You'll need to figure it out
On Automation
How about measuring ourselves?
● Learn what to automate
○ And what not to
● For stateless clusters, deployment is more important
○ Don't fix stuff you can replace
● For stateful ones, you'll want to keep lifecycle
○ You'll hate when your database server that has a disk space problem gets a brand new bigger
volume that also doesn't have your database
● Terraform/Chef/Puppet/Ansible/Salt/Whatever-works-for-you
○ All software sucks in one way or another
● Infrastructure as code
○ Versioned
○ Testable, tested routinely (deployment and life-cycle scenarios)
Why?
● Repeatability
○ High-fidelity replicas of the production environment
● Consistency
○ Unless you change something, the result should always be the same
■ Either that, or you have a more serious bug
● Less work == fewer mistakes
● Isn't it handy that the script you use to automate is under version control?
● One side can build upon the tools of the other - and vice-versa
○ And collaborate
■ And, ultimately, becoming a single team, even if you have different roles
So?
It all boils down to attitude
It's about cooperation, about tearing down walls
Offering and accepting insights from different people with different priorities
Reconciling different priorities around a single mission
A drive to understand your tooling, your systems
(Random) failure is not an option
Questions?
Thank you
@rbanffy - twitter, github,

More Related Content

What's hot (19)

PDF
How To Think About Performance
PerconaPerformance
 
DOC
MSC Temporary Passwords reset tool
Nag Arvind Gudiseva
 
PPTX
Automation testing
kamilkaide
 
PDF
Performance testing for web-scale
Izzet Mustafaiev
 
PDF
Fast end-to-end-tests
Rikke Veng Petersen
 
PPTX
Selenium done right
Tal Landa
 
PPTX
Automation pyramid within CI process
UP2IT
 
PDF
Infrastructure talk
Joseph Muli
 
PPTX
Learning to Enjoy Unit Testing
Micah Armantrout
 
PDF
Art of Estimation. Vlad Savitsky
Vlad Savitsky
 
PDF
20150128 angular js_headless_testing
Benjamin Neu
 
PDF
Tdd
Dmitry Savin
 
PDF
Software Development Lifecycles
OneDesignCompany
 
PPTX
ProcessA
jaye Martin
 
ODP
Scrum Methodology
Kunta Hutabarat
 
DOCX
Premiere shots
ngeo97
 
PDF
Supporting DevOps the Smart Way by Melissa Tondi
QA or the Highway
 
PDF
My slides from SECR'2018
Alex Chistyakov
 
PDF
Beer & Beta by Flockler - Feb 4th 2016
Sointu Karjalainen
 
How To Think About Performance
PerconaPerformance
 
MSC Temporary Passwords reset tool
Nag Arvind Gudiseva
 
Automation testing
kamilkaide
 
Performance testing for web-scale
Izzet Mustafaiev
 
Fast end-to-end-tests
Rikke Veng Petersen
 
Selenium done right
Tal Landa
 
Automation pyramid within CI process
UP2IT
 
Infrastructure talk
Joseph Muli
 
Learning to Enjoy Unit Testing
Micah Armantrout
 
Art of Estimation. Vlad Savitsky
Vlad Savitsky
 
20150128 angular js_headless_testing
Benjamin Neu
 
Software Development Lifecycles
OneDesignCompany
 
ProcessA
jaye Martin
 
Scrum Methodology
Kunta Hutabarat
 
Premiere shots
ngeo97
 
Supporting DevOps the Smart Way by Melissa Tondi
QA or the Highway
 
My slides from SECR'2018
Alex Chistyakov
 
Beer & Beta by Flockler - Feb 4th 2016
Sointu Karjalainen
 

Similar to Monitoring and automation (20)

PDF
Evolution of unix environments and the road to faster deployments
Rakuten Group, Inc.
 
ODP
Dev ops
Eslam El Husseiny
 
PDF
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Steve Mercier
 
PDF
Raise the Bar! Reloaded
Codemotion
 
PDF
Raise the bar! Reloaded
Alessandro Franceschi
 
PDF
VMworld 2013: Best Practices for Application Lifecycle Management with vCloud...
VMworld
 
ODP
Automating MySQL operations with Puppet
Kris Buytaert
 
PPTX
OpenFest 2014 Aggressive DevOps
Ivo Vachkov
 
PDF
Introduction to DevOps
OCTO Technology
 
PPTX
The end of server management : hosting have to become a commodity - #devoxxPL...
Quentin Adam
 
PDF
From scheduled downtime to self-healing
Károly Nagy
 
PDF
Eric tucker - Eliminating "Over the Fence"
Maritime DevCon
 
PDF
Immutable Infrastructure: Rise of the Machine Images
C4Media
 
PDF
What we talk about when we talk about DevOps
Ricard Clau
 
PDF
Agile infrastructure
Tarun Rajput
 
PPT
Infrastructure as Code to Maintain your Sanity
Dewey Sasser
 
PDF
Immutable infrastructure - Plain Concepts DevOps day
Plain Concepts
 
PPTX
What is DevOps?
Mesut Güneş
 
PDF
Lessons From A DevOps Transformation on AWS
Hrishikesh Barua
 
PDF
meetup version of Paving the road to production
Matthew Reynolds
 
Evolution of unix environments and the road to faster deployments
Rakuten Group, Inc.
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Steve Mercier
 
Raise the Bar! Reloaded
Codemotion
 
Raise the bar! Reloaded
Alessandro Franceschi
 
VMworld 2013: Best Practices for Application Lifecycle Management with vCloud...
VMworld
 
Automating MySQL operations with Puppet
Kris Buytaert
 
OpenFest 2014 Aggressive DevOps
Ivo Vachkov
 
Introduction to DevOps
OCTO Technology
 
The end of server management : hosting have to become a commodity - #devoxxPL...
Quentin Adam
 
From scheduled downtime to self-healing
Károly Nagy
 
Eric tucker - Eliminating "Over the Fence"
Maritime DevCon
 
Immutable Infrastructure: Rise of the Machine Images
C4Media
 
What we talk about when we talk about DevOps
Ricard Clau
 
Agile infrastructure
Tarun Rajput
 
Infrastructure as Code to Maintain your Sanity
Dewey Sasser
 
Immutable infrastructure - Plain Concepts DevOps day
Plain Concepts
 
What is DevOps?
Mesut Güneş
 
Lessons From A DevOps Transformation on AWS
Hrishikesh Barua
 
meetup version of Paving the road to production
Matthew Reynolds
 
Ad

More from Ricardo Bánffy (15)

PDF
Continuous testing of a terminal font
Ricardo Bánffy
 
PDF
Measure everything you can
Ricardo Bánffy
 
PDF
Lessons learned after 190M lessons served
Ricardo Bánffy
 
PDF
Anti-patterns
Ricardo Bánffy
 
PPTX
TDD with Python and App Engine
Ricardo Bánffy
 
PPTX
TDD com Python e App Engine
Ricardo Bánffy
 
PDF
Da persistência de idéias ruins
Ricardo Bánffy
 
PDF
Boredom comes to_those_who_wait
Ricardo Bánffy
 
PDF
Quem espera sempre cansa
Ricardo Bánffy
 
ODP
Extreme Programming
Ricardo Bánffy
 
ODP
Django para infográficos
Ricardo Bánffy
 
PPT
Faça seu próprio servidor pirata com OpenVZ
Ricardo Bánffy
 
ODP
Ganhando dinheiro com software livre
Ricardo Bánffy
 
Continuous testing of a terminal font
Ricardo Bánffy
 
Measure everything you can
Ricardo Bánffy
 
Lessons learned after 190M lessons served
Ricardo Bánffy
 
Anti-patterns
Ricardo Bánffy
 
TDD with Python and App Engine
Ricardo Bánffy
 
TDD com Python e App Engine
Ricardo Bánffy
 
Da persistência de idéias ruins
Ricardo Bánffy
 
Boredom comes to_those_who_wait
Ricardo Bánffy
 
Quem espera sempre cansa
Ricardo Bánffy
 
Extreme Programming
Ricardo Bánffy
 
Django para infográficos
Ricardo Bánffy
 
Faça seu próprio servidor pirata com OpenVZ
Ricardo Bánffy
 
Ganhando dinheiro com software livre
Ricardo Bánffy
 
Ad

Recently uploaded (20)

PPTX
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
PPTX
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
PPTX
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
PPTX
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PPTX
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PDF
DevOps Design for different deployment options
henrymails
 
PPTX
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PPT
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PPTX
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
PPTX
一比一原版(LaTech毕业证)路易斯安那理工大学毕业证如何办理
Taqyea
 
PPTX
INTEGRATION OF ICT IN LEARNING AND INCORPORATIING TECHNOLOGY
kvshardwork1235
 
PPTX
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
PPT
introductio to computers by arthur janry
RamananMuthukrishnan
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PDF
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
DevOps Design for different deployment options
henrymails
 
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
一比一原版(LaTech毕业证)路易斯安那理工大学毕业证如何办理
Taqyea
 
INTEGRATION OF ICT IN LEARNING AND INCORPORATIING TECHNOLOGY
kvshardwork1235
 
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
introductio to computers by arthur janry
RamananMuthukrishnan
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 

Monitoring and automation

  • 1. Monitoring and Automation From developer to devops
  • 3. Quick questions ● Who here is doing operations? ● Who develops applications? ● Who develops infrastructure software? ● Who is doing all above? ● Who manages more than 10 servers? ● Who manages more than 100? ● 1000?
  • 4. Where do we look?
  • 5. Why do we measure? ● Optimize hardware usage ● Locate performance bottlenecks ● Identify anomalous behaviour ● Understand our own operational characteristics ○ How long do we normally take to add a new type of server? ○ What should we automate? ● Find out the costs of a given operation ○ If your operation takes 3 minutes of a server time, how much did we pay for it? ● Metering and billing ● Better understand the user ● Plan for future versions
  • 6. Where do we look? CPU counters Revenue Profit Strategies Host OSGuest OSPlatformApplicationInternal processes
  • 7. Where do we look? CPU counters Revenue Profit Strategies Host OSGuest OSPlatformApplicationInternal processes OperationslandDeveloperland
  • 8. Where do we look? CPU counters Revenue Profit Strategies Host OSGuest OSPlatformApplicationInternal processes OperationslandDeveloperland DevOpsland (according to a very lax definition)
  • 9. Where do we look? CPU counters Revenue Profit Strategies Host OSGuest OSPlatformApplicationInternal processes OperationslandDeveloperland DevOpsland Billing, HW usage Performance What to automate?
  • 10. How do we look? ● Time resolution (and retention) ● GDPR (don't be evil, or keep what you don't need) ● Push vs. Pull ○ Register clients or let them register themselves ● Is UDP your friend? ○ Fast, cheap and unreliable ○ If you need security, you need to build it ○ You should probably use it in combination with TCP (a subsampled guaranteed channel) ● Standards ○ Prometheus is your friend ● Visualization ○ You'll need to figure it out
  • 12. How about measuring ourselves? ● Learn what to automate ○ And what not to ● For stateless clusters, deployment is more important ○ Don't fix stuff you can replace ● For stateful ones, you'll want to keep lifecycle ○ You'll hate when your database server that has a disk space problem gets a brand new bigger volume that also doesn't have your database ● Terraform/Chef/Puppet/Ansible/Salt/Whatever-works-for-you ○ All software sucks in one way or another ● Infrastructure as code ○ Versioned ○ Testable, tested routinely (deployment and life-cycle scenarios)
  • 13. Why? ● Repeatability ○ High-fidelity replicas of the production environment ● Consistency ○ Unless you change something, the result should always be the same ■ Either that, or you have a more serious bug ● Less work == fewer mistakes ● Isn't it handy that the script you use to automate is under version control? ● One side can build upon the tools of the other - and vice-versa ○ And collaborate ■ And, ultimately, becoming a single team, even if you have different roles
  • 14. So?
  • 15. It all boils down to attitude It's about cooperation, about tearing down walls Offering and accepting insights from different people with different priorities Reconciling different priorities around a single mission A drive to understand your tooling, your systems (Random) failure is not an option
  • 17. Thank you @rbanffy - twitter, github,