SlideShare a Scribd company logo
Test driven
Infrastructure
development
Tomas Doran
bobtfish@bobtfish.net
@bobtfish
Puppetconf 2013
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
•High availability!
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
•High availability!
•Automated testing of all
infrastructure changes
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High confidence in changes
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High confidence in changes
•Continuous integration and
deployment for infrastructure
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the configuration of all the instances has to be
dynamic!
So who the hell am I?
Dev
Infrastructure automation nut!
Ex-backend web developer, Ex-security, currently fixing puppet at Yelp!
Dev / Ops
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
Dev / Ops
•Developer viewpoint
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
Dev / Ops
•Developer viewpoint
•Grass IS greener
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
Dev / Ops
•Developer viewpoint
•Grass IS greener
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
Dev / Ops
•Developer viewpoint
•Grass IS greener
•Think of your infra as an
agile software project...
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
Dev / Ops
•Developer viewpoint
•Grass IS greener
•Think of your infra as an
agile software project...
•What workflow do I want?
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?
The state of the art
Going to talk about how I think the generally accepted way of doing some things is
fundamentally broken!
But lets start with a simple description of the issues I’m worrying about.
CM = state machine
Each change puppet makes (or attempts to make) is a state transition. Each circle represents
the configuration state of the server on disc + services running etc..
Non deterministic
This is the key observation here - you don’t know which way puppet’s gonna jump :)
In this case - it doesn’t matter, as the two operations are orthogonal.
Convergent!
Convergence is when each run of puppet takes you nearer to 0 changes, but the next run
makes additional changes..
The classic way to screw this up is to miss a dependency in your code.
Convergent!
Of course, this doesn’t happen - the first step goes BANG, then mysql gets installed,
creates /etc/mysql.
The second puppet run _then_ sets the config up..
err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set 'file on ensure: No such file or
directory - /etc/mysql/
my.cnf.puppettmp_3706 at /home/tdoran/
test.pp:4
Aaand in your puppet logs, you get.
Purple text of rage!
err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set 'file on ensure: No such file or
directory - /etc/mysql/
my.cnf.puppettmp_3706 at /home/tdoran/
test.pp:4
THE PURPLE TEXT OF RAGE
Convergent!
(Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott.pdf)
Aaand your machine is convergent - i.e. it gets towards the desired state in a number of
steps..
•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy case, where puppet
can detect hat and tell you! It’s also entirely possible to be totally silent.
It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet
run to fully provision a server!
Fixable!
•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy case, where puppet
can detect hat and tell you! It’s also entirely possible to be totally silent.
It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet
run to fully provision a server!
Fixable!
•before
•require
•subscribe
•notify
What about an
entire
infrastructure?
The $64,000 question is....
A whole stack
Lets start simple, but semi realistic.
Gonna ignore databases.
Gonna ignore monitoring.
Gonna ignore the n[eo]twork.
Exported resources
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies find the webs + register themselves, lbs then find the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.
Exported resources
• Inter machine dependencies
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies find the webs + register themselves, lbs then find the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.
Exported resources
• Inter machine dependencies
• Unidirectional!
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies find the webs + register themselves, lbs then find the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.
Exported resources
• Inter machine dependencies
• Unidirectional!
• Known graph - webs, proxies, lbs
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies find the webs + register themselves, lbs then find the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.
Exported resources
• Inter machine dependencies
• Unidirectional!
• Known graph - webs, proxies, lbs
• Puppetroll (github.com/youdevise/
puppetroll)
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies find the webs + register themselves, lbs then find the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.
Exported resources
(Shameless ripoff of http://xkcd.com/1171/ )
Ordering dependent. Hard to test (in isolation). Slooow (have to run in order)
Co-dependence
And if we really are talking about entire infrastructures...
Then maybe we need some of these.
Co-dependence
:(
You _know_ that if everything is dynamically configured that you’re gonna have to do
multiple puppet runs per server...
Do we _really_ want to keep running puppet till it stops changing things?
The solution - an
external model
Use your software model to generate a set of machines for an environment.
And generate config for puppet to apply to each system to configure it
Add super secret special sauce (lots and lots of mcollective!)
The solution - an
external model
• Represent system as a set of ruby classes
Use your software model to generate a set of machines for an environment.
And generate config for puppet to apply to each system to configure it
Add super secret special sauce (lots and lots of mcollective!)
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
Use your software model to generate a set of machines for an environment.
And generate config for puppet to apply to each system to configure it
Add super secret special sauce (lots and lots of mcollective!)
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
• Dependencies
Use your software model to generate a set of machines for an environment.
And generate config for puppet to apply to each system to configure it
Add super secret special sauce (lots and lots of mcollective!)
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
• Dependencies
• Domain knowledge
Use your software model to generate a set of machines for an environment.
And generate config for puppet to apply to each system to configure it
Add super secret special sauce (lots and lots of mcollective!)
This is a simplified / minimal example jenkins environment - just 4 machines (2 web apps, 2
load balancers)
ENC data!
Our external node classifier generates this for each of the 4 machines, which translates to
puppet code run on the server.
Note how every server gets all of it’s dependencies
There’s a companion data structure sent to the agent which actually provisons the virtual
Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to
KVM servers to provision machines.. VMs start, boot, run puppet, send cert to puppetmaster,
--waitforcert.
Central provisioning asks ‘do we have a cert’, waits - signs it. Looks up DNS and ENC to
Automate all the things
Suddenly, I have massive power.
I can write a small script to bring up a whole production like environment, run tests against
it, tear it down. I can do this against the latest puppet changes, and only promote them to
run on production servers when the tests pass!
BDD infrastructure
Behavior driven development - given I have a high level model of the systems comprising an
infrastructure, I can then write equally high level tests to assert the behavior of that
infrastructure
BDD infrastructure
• Given
For example...
BDD infrastructure
• Given – the Service has finished being
provisioned
BDD infrastructure
• Given – the Service has finished being
provisioned
• And
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When – when we destroy a single member of
the service
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When – when we destroy a single member of
the service
• Then
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When – when we destroy a single member of
the service
• Then – we expect all monitoring at the service
level to be passing
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When – when we destroy a single member of
the service
• Then – we expect all monitoring at the service
level to be passing
• And
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
passing
• When – when we destroy a single member of
the service
• Then – we expect all monitoring at the service
level to be passing
• And – we expect all monitoring at the single
machine level to be failing
Yes, I am suggesting regression testing your load balancer setup...
Is this for real?
Is this for real?
•Yes!
Is this for real?
•Yes!
• We actually built this, the core parts are on
github
Is this for real?
•Yes!
• We actually built this, the core parts are on
github
• Deployed real applications to production at
TIM Group
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High confidence in changes
•Continuous integration and
deployment for infrastructure
This is my promised land!
Questions?
• https://devblog.timgroup.com/2013/06/14/
exported-resources-considered-harmful/
• https://devblog.timgroup.com/2013/06/26/
scenario-testing-infrastructures/
• https://github.com/youdevise/provisioning-
tools
• https://github.com/youdevise/stackbuilder

More Related Content

What's hot (18)

KEY
Herding a Cat with Antlers - Catalyst 5.80
Tomas Doran
 
PDF
SCALE 10x Build a Cloud Day
Chef Software, Inc.
 
KEY
Michelin Starred Cooking with Chef
Jon Cowie
 
PDF
Play Framework: Intro & High-Level Overview
Josh Padnick
 
PDF
Akka in Practice: Designing Actor-based Applications
NLJUG
 
PDF
A Tale of Two Workflows - ChefConf 2014
Pete Cheslock
 
PPTX
Akka Fundamentals
Michael Kendra
 
PDF
Immutable infrastructure with Boxfuse
Lars Östling
 
KEY
Work Queues
ciconf
 
PPTX
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Simplilearn
 
PDF
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
Puppet
 
KEY
Migrating big data
lauraxthomson
 
PDF
Building Reactive Systems with Akka (in Java 8 or Scala)
Jonas Bonér
 
PDF
Open Source Tools for Leveling Up Operations FOSSET 2014
Mandi Walls
 
PPTX
Release the Monkeys ! Testing in the Wild at Netflix
Gareth Bowles
 
KEY
Intro to Drush
Four Kitchens
 
PDF
Lessons I Learned While Scaling to 5000 Puppet Agents
Puppet
 
PDF
Toplog candy elves - HOCM Talk
Patrick LaRoche
 
Herding a Cat with Antlers - Catalyst 5.80
Tomas Doran
 
SCALE 10x Build a Cloud Day
Chef Software, Inc.
 
Michelin Starred Cooking with Chef
Jon Cowie
 
Play Framework: Intro & High-Level Overview
Josh Padnick
 
Akka in Practice: Designing Actor-based Applications
NLJUG
 
A Tale of Two Workflows - ChefConf 2014
Pete Cheslock
 
Akka Fundamentals
Michael Kendra
 
Immutable infrastructure with Boxfuse
Lars Östling
 
Work Queues
ciconf
 
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Simplilearn
 
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
Puppet
 
Migrating big data
lauraxthomson
 
Building Reactive Systems with Akka (in Java 8 or Scala)
Jonas Bonér
 
Open Source Tools for Leveling Up Operations FOSSET 2014
Mandi Walls
 
Release the Monkeys ! Testing in the Wild at Netflix
Gareth Bowles
 
Intro to Drush
Four Kitchens
 
Lessons I Learned While Scaling to 5000 Puppet Agents
Puppet
 
Toplog candy elves - HOCM Talk
Patrick LaRoche
 

Viewers also liked (20)

PDF
La fiabilidad de Google en la búsqueda de información sobre seguridad del pac...
David Novillo Ortiz, MLIS, PhD
 
PPT
Quick mockup
Hustle Citi
 
DOCX
Cuestionario de discapacidad (1) revisado
ElenyConde
 
DOC
Wat scheelt er met koning Filip?
Thierry Debels
 
PDF
fidel casa
Adriana Calero Mesa
 
PDF
Бизнес мышление у сотрудников IT сферы
SQALab
 
PPTX
MADLat 2014 keynote
Richard Van Eck
 
PPTX
Presentación ple
Polo Apolo
 
PDF
Insider's Guide to SXSW-i
MSL
 
PDF
The Breaking Bad Guide to Storytelling
Kapost
 
ODP
Formula 1
albil
 
PPT
Disic mars2014
Yves Caseau
 
PPS
Asturiasnaturalezaviva 24 10 07 An
guest23830b
 
PPTX
Persuasive content
Jeffrey Stevens
 
PDF
Επενδυτικός Νόμος
All about Business
 
PPT
Leveraging social media for special events
Amber Recker
 
PDF
Leveraging social media for your marketing strategy
Andrew Chow ✯ Keynote Speaker ✯
 
DOC
Prinses Paola afgewezen door Magritte
Thierry Debels
 
PPTX
Patriotism 2011 ka
clementscynthia
 
PPTX
Sold! Event - August 19, 2011
SoldEvents
 
La fiabilidad de Google en la búsqueda de información sobre seguridad del pac...
David Novillo Ortiz, MLIS, PhD
 
Quick mockup
Hustle Citi
 
Cuestionario de discapacidad (1) revisado
ElenyConde
 
Wat scheelt er met koning Filip?
Thierry Debels
 
Бизнес мышление у сотрудников IT сферы
SQALab
 
MADLat 2014 keynote
Richard Van Eck
 
Presentación ple
Polo Apolo
 
Insider's Guide to SXSW-i
MSL
 
The Breaking Bad Guide to Storytelling
Kapost
 
Formula 1
albil
 
Disic mars2014
Yves Caseau
 
Asturiasnaturalezaviva 24 10 07 An
guest23830b
 
Persuasive content
Jeffrey Stevens
 
Επενδυτικός Νόμος
All about Business
 
Leveraging social media for special events
Amber Recker
 
Leveraging social media for your marketing strategy
Andrew Chow ✯ Keynote Speaker ✯
 
Prinses Paola afgewezen door Magritte
Thierry Debels
 
Patriotism 2011 ka
clementscynthia
 
Sold! Event - August 19, 2011
SoldEvents
 
Ad

Similar to Test driven infrastructure development (2 - puppetconf 2013 edition) (20)

PDF
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
ODP
Automating MySQL operations with Puppet
Kris Buytaert
 
PDF
Puppet for Sys Admins
Puppet
 
ZIP
Intro To Puppet.Key
Work
 
PDF
Orchestration Panel at Cloud Connect 2010
dev2ops
 
PPTX
Introduction to DevOps
Dmitry Buzdin
 
PDF
From scheduled downtime to self-healing
Károly Nagy
 
PPTX
Immutable infrastructure isn’t the answer
Sam Bashton
 
PDF
Easy oracle & weblogic provisioning and deployment
Bert Hajee
 
PDF
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet
 
PDF
Achieving Continuous Delivery with Puppet
Devoteam Revolve
 
PDF
Puppet for SysAdmins
Puppet
 
PPT
Monitoring IAAS & PAAS Solutions
Colloquium
 
PPT
PowerPoint Presentation
lalitjangra9
 
PPTX
Puppet & Perforce: Versioning Everything for Deployments
Perforce
 
PDF
Workflow story: Theory versus Practice in large enterprises by Marcin Piebiak
NETWAYS
 
PDF
Workflow story: Theory versus practice in Large Enterprises
Puppet
 
ODP
When traditional configuration management is to slow for your needs
Kris Buytaert
 
PDF
One-Man Ops
Jos Boumans
 
PDF
2012 a deployment pipeline for infrastructure a dev ops case study at nbn _ ...
sauravs007
 
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Automating MySQL operations with Puppet
Kris Buytaert
 
Puppet for Sys Admins
Puppet
 
Intro To Puppet.Key
Work
 
Orchestration Panel at Cloud Connect 2010
dev2ops
 
Introduction to DevOps
Dmitry Buzdin
 
From scheduled downtime to self-healing
Károly Nagy
 
Immutable infrastructure isn’t the answer
Sam Bashton
 
Easy oracle & weblogic provisioning and deployment
Bert Hajee
 
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet
Puppet
 
Achieving Continuous Delivery with Puppet
Devoteam Revolve
 
Puppet for SysAdmins
Puppet
 
Monitoring IAAS & PAAS Solutions
Colloquium
 
PowerPoint Presentation
lalitjangra9
 
Puppet & Perforce: Versioning Everything for Deployments
Perforce
 
Workflow story: Theory versus Practice in large enterprises by Marcin Piebiak
NETWAYS
 
Workflow story: Theory versus practice in Large Enterprises
Puppet
 
When traditional configuration management is to slow for your needs
Kris Buytaert
 
One-Man Ops
Jos Boumans
 
2012 a deployment pipeline for infrastructure a dev ops case study at nbn _ ...
sauravs007
 
Ad

More from Tomas Doran (20)

PPTX
Long haul infrastructure: Failures and successes
Tomas Doran
 
PDF
Empowering developers to deploy their own data stores
Tomas Doran
 
PDF
Dockersh and a brief intro to the docker internals
Tomas Doran
 
PDF
Sensu and Sensibility - Puppetconf 2014
Tomas Doran
 
PDF
Steamlining your puppet development workflow
Tomas Doran
 
PDF
Building a smarter application stack - service discovery and wiring for Docker
Tomas Doran
 
PDF
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Tomas Doran
 
PPT
Deploying puppet code at light speed
Tomas Doran
 
PDF
Thinking through puppet code layout
Tomas Doran
 
PDF
Docker puppetcamp london 2013
Tomas Doran
 
PDF
"The worst code I ever wrote"
Tomas Doran
 
PDF
Test driven infrastructure development
Tomas Doran
 
PPT
London devops - orc
Tomas Doran
 
KEY
London devops logging
Tomas Doran
 
KEY
Message:Passing - lpw 2012
Tomas Doran
 
KEY
Webapp security testing
Tomas Doran
 
KEY
Webapp security testing
Tomas Doran
 
KEY
Dates aghhhh!!?!?!?!
Tomas Doran
 
KEY
Messaging, interoperability and log aggregation - a new framework
Tomas Doran
 
KEY
Zero mq logs
Tomas Doran
 
Long haul infrastructure: Failures and successes
Tomas Doran
 
Empowering developers to deploy their own data stores
Tomas Doran
 
Dockersh and a brief intro to the docker internals
Tomas Doran
 
Sensu and Sensibility - Puppetconf 2014
Tomas Doran
 
Steamlining your puppet development workflow
Tomas Doran
 
Building a smarter application stack - service discovery and wiring for Docker
Tomas Doran
 
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Tomas Doran
 
Deploying puppet code at light speed
Tomas Doran
 
Thinking through puppet code layout
Tomas Doran
 
Docker puppetcamp london 2013
Tomas Doran
 
"The worst code I ever wrote"
Tomas Doran
 
Test driven infrastructure development
Tomas Doran
 
London devops - orc
Tomas Doran
 
London devops logging
Tomas Doran
 
Message:Passing - lpw 2012
Tomas Doran
 
Webapp security testing
Tomas Doran
 
Webapp security testing
Tomas Doran
 
Dates aghhhh!!?!?!?!
Tomas Doran
 
Messaging, interoperability and log aggregation - a new framework
Tomas Doran
 
Zero mq logs
Tomas Doran
 

Recently uploaded (20)

PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 

Test driven infrastructure development (2 - puppetconf 2013 edition)

  • 2. Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 3. •High availability! Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 4. •High availability! •Automated testing of all infrastructure changes Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 5. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 6. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 7. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes •Continuous integration and deployment for infrastructure Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  • 8. So who the hell am I?
  • 9. Dev Infrastructure automation nut! Ex-backend web developer, Ex-security, currently fixing puppet at Yelp!
  • 10. Dev / Ops State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 11. Dev / Ops •Developer viewpoint State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 12. Dev / Ops •Developer viewpoint •Grass IS greener State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 13. Dev / Ops •Developer viewpoint •Grass IS greener State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 14. Dev / Ops •Developer viewpoint •Grass IS greener •Think of your infra as an agile software project... State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 15. Dev / Ops •Developer viewpoint •Grass IS greener •Think of your infra as an agile software project... •What workflow do I want? State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  • 16. The state of the art Going to talk about how I think the generally accepted way of doing some things is fundamentally broken! But lets start with a simple description of the issues I’m worrying about.
  • 17. CM = state machine Each change puppet makes (or attempts to make) is a state transition. Each circle represents the configuration state of the server on disc + services running etc..
  • 18. Non deterministic This is the key observation here - you don’t know which way puppet’s gonna jump :) In this case - it doesn’t matter, as the two operations are orthogonal.
  • 19. Convergent! Convergence is when each run of puppet takes you nearer to 0 changes, but the next run makes additional changes.. The classic way to screw this up is to miss a dependency in your code.
  • 20. Convergent! Of course, this doesn’t happen - the first step goes BANG, then mysql gets installed, creates /etc/mysql. The second puppet run _then_ sets the config up..
  • 21. err: /Stage[main]//File[/etc/mysql/my.cnf]/ ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/ my.cnf.puppettmp_3706 at /home/tdoran/ test.pp:4 Aaand in your puppet logs, you get.
  • 22. Purple text of rage! err: /Stage[main]//File[/etc/mysql/my.cnf]/ ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/ my.cnf.puppettmp_3706 at /home/tdoran/ test.pp:4 THE PURPLE TEXT OF RAGE
  • 23. Convergent! (Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott.pdf) Aaand your machine is convergent - i.e. it gets towards the desired state in a number of steps..
  • 24. •before •require •subscribe •notify As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent. It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!
  • 25. Fixable! •before •require •subscribe •notify As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent. It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!
  • 27. A whole stack Lets start simple, but semi realistic. Gonna ignore databases. Gonna ignore monitoring. Gonna ignore the n[eo]twork.
  • 28. Exported resources Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  • 29. Exported resources • Inter machine dependencies Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  • 30. Exported resources • Inter machine dependencies • Unidirectional! Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  • 31. Exported resources • Inter machine dependencies • Unidirectional! • Known graph - webs, proxies, lbs Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  • 32. Exported resources • Inter machine dependencies • Unidirectional! • Known graph - webs, proxies, lbs • Puppetroll (github.com/youdevise/ puppetroll) Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  • 33. Exported resources (Shameless ripoff of http://xkcd.com/1171/ ) Ordering dependent. Hard to test (in isolation). Slooow (have to run in order)
  • 34. Co-dependence And if we really are talking about entire infrastructures... Then maybe we need some of these.
  • 35. Co-dependence :( You _know_ that if everything is dynamically configured that you’re gonna have to do multiple puppet runs per server... Do we _really_ want to keep running puppet till it stops changing things?
  • 36. The solution - an external model Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  • 37. The solution - an external model • Represent system as a set of ruby classes Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  • 38. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  • 39. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments • Dependencies Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  • 40. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments • Dependencies • Domain knowledge Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  • 41. This is a simplified / minimal example jenkins environment - just 4 machines (2 web apps, 2 load balancers)
  • 42. ENC data! Our external node classifier generates this for each of the 4 machines, which translates to puppet code run on the server. Note how every server gets all of it’s dependencies There’s a companion data structure sent to the agent which actually provisons the virtual
  • 43. Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to KVM servers to provision machines.. VMs start, boot, run puppet, send cert to puppetmaster, --waitforcert. Central provisioning asks ‘do we have a cert’, waits - signs it. Looks up DNS and ENC to
  • 44. Automate all the things Suddenly, I have massive power. I can write a small script to bring up a whole production like environment, run tests against it, tear it down. I can do this against the latest puppet changes, and only promote them to run on production servers when the tests pass!
  • 45. BDD infrastructure Behavior driven development - given I have a high level model of the systems comprising an infrastructure, I can then write equally high level tests to assert the behavior of that infrastructure
  • 47. BDD infrastructure • Given – the Service has finished being provisioned
  • 48. BDD infrastructure • Given – the Service has finished being provisioned • And
  • 49. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing
  • 50. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When
  • 51. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service
  • 52. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then
  • 53. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing
  • 54. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing • And
  • 55. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing • And – we expect all monitoring at the single machine level to be failing Yes, I am suggesting regression testing your load balancer setup...
  • 56. Is this for real?
  • 57. Is this for real? •Yes!
  • 58. Is this for real? •Yes! • We actually built this, the core parts are on github
  • 59. Is this for real? •Yes! • We actually built this, the core parts are on github • Deployed real applications to production at TIM Group
  • 60. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes •Continuous integration and deployment for infrastructure This is my promised land!
  • 61. Questions? • https://devblog.timgroup.com/2013/06/14/ exported-resources-considered-harmful/ • https://devblog.timgroup.com/2013/06/26/ scenario-testing-infrastructures/ • https://github.com/youdevise/provisioning- tools • https://github.com/youdevise/stackbuilder