Test driven infrastructure development (2 - puppetconf 2013 edition)

Test driven
Infrastructure
development
Tomas Doran
bobtfish@bobtfish.net
@bobtfish
Puppetconf 2013

Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any
environment I want, whenever I want - so _all_ the conﬁguration of all the instances has to be
dynamic!

•High availability!
dynamic!

•Automated testing of all
infrastructure changes
dynamic!

•Entirely repeatable application
environments
dynamic!

environments
•High conﬁdence in changes
dynamic!

environments
•Continuous integration and
deployment for infrastructure
dynamic!

Dev
Infrastructure automation nut!
Ex-backend web developer, Ex-security, currently ﬁxing puppet at Yelp!

Dev / Ops
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams being adverse to change and conservative - slows the
business down!
Why isn’t your infrastructure an agile software project?

Dev / Ops
•Developer viewpoint
business down!

Dev / Ops
•Grass IS greener
business down!

Dev / Ops
•Grass IS greener
•Think of your infra as an
agile software project...
business down!

Dev / Ops
•Grass IS greener
•Think of your infra as an
agile software project...
•What workﬂow do I want?
business down!

The state of the art
Going to talk about how I think the generally accepted way of doing some things is
fundamentally broken!
But lets start with a simple description of the issues I’m worrying about.

CM = state machine
Each change puppet makes (or attempts to make) is a state transition. Each circle represents
the conﬁguration state of the server on disc + services running etc..

Non deterministic
This is the key observation here - you don’t know which way puppet’s gonna jump :)
In this case - it doesn’t matter, as the two operations are orthogonal.

Convergent!
Convergence is when each run of puppet takes you nearer to 0 changes, but the next run
makes additional changes..
The classic way to screw this up is to miss a dependency in your code.

Convergent!
Of course, this doesn’t happen - the ﬁrst step goes BANG, then mysql gets installed,
creates /etc/mysql.
The second puppet run _then_ sets the conﬁg up..

err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set 'file on ensure: No such file or
directory - /etc/mysql/
my.cnf.puppettmp_3706 at /home/tdoran/
test.pp:4
Aaand in your puppet logs, you get.

Purple text of rage!
err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set 'file on ensure: No such file or
directory - /etc/mysql/
my.cnf.puppettmp_3706 at /home/tdoran/
test.pp:4
THE PURPLE TEXT OF RAGE

Convergent!
(Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott.pdf)
Aaand your machine is convergent - i.e. it gets towards the desired state in a number of
steps..

•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy case, where puppet
can detect hat and tell you! It’s also entirely possible to be totally silent.
It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet
run to fully provision a server!

Fixable!
•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy case, where puppet
can detect hat and tell you! It’s also entirely possible to be totally silent.
It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet
run to fully provision a server!

Fixable!
•before
•require
•subscribe
•notify
What about an
entire
infrastructure?
The $64,000 question is....

A whole stack
Lets start simple, but semi realistic.
Gonna ignore databases.
Gonna ignore monitoring.
Gonna ignore the n[eo]twork.

Exported resources
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies ﬁnd the webs + register themselves, lbs then ﬁnd the proxy).
Given you know the dependencies - you can get consistent runs by ordering them.

Exported resources
• Inter machine dependencies

Exported resources
• Unidirectional!

Exported resources
• Unidirectional!
• Known graph - webs, proxies, lbs

Exported resources
• Unidirectional!
• Known graph - webs, proxies, lbs
• Puppetroll (github.com/youdevise/
puppetroll)

Exported resources
(Shameless ripoff of http://xkcd.com/1171/ )
Ordering dependent. Hard to test (in isolation). Slooow (have to run in order)

Co-dependence
And if we really are talking about entire infrastructures...
Then maybe we need some of these.

Co-dependence
:(
You _know_ that if everything is dynamically conﬁgured that you’re gonna have to do
multiple puppet runs per server...
Do we _really_ want to keep running puppet till it stops changing things?

The solution - an
external model
Use your software model to generate a set of machines for an environment.
And generate conﬁg for puppet to apply to each system to conﬁgure it
Add super secret special sauce (lots and lots of mcollective!)

The solution - an
external model
• Represent system as a set of ruby classes

The solution - an
external model
• DSL for describing environments

The solution - an
external model
• Dependencies

The solution - an
external model
• Dependencies
• Domain knowledge

This is a simpliﬁed / minimal example jenkins environment - just 4 machines (2 web apps, 2
load balancers)

ENC data!
Our external node classiﬁer generates this for each of the 4 machines, which translates to
puppet code run on the server.
Note how every server gets all of it’s dependencies
There’s a companion data structure sent to the agent which actually provisons the virtual

Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to
KVM servers to provision machines.. VMs start, boot, run puppet, send cert to puppetmaster,
--waitforcert.
Central provisioning asks ‘do we have a cert’, waits - signs it. Looks up DNS and ENC to

Automate all the things
Suddenly, I have massive power.
I can write a small script to bring up a whole production like environment, run tests against
it, tear it down. I can do this against the latest puppet changes, and only promote them to
run on production servers when the tests pass!

BDD infrastructure
Behavior driven development - given I have a high level model of the systems comprising an
infrastructure, I can then write equally high level tests to assert the behavior of that
infrastructure

BDD infrastructure
• Given
For example...

BDD infrastructure
• Given – the Service has ﬁnished being
provisioned

BDD infrastructure
provisioned
• And

BDD infrastructure
provisioned
• And – all monitoring related to the service is
passing

BDD infrastructure
provisioned
passing
• When

BDD infrastructure
provisioned
passing
• When – when we destroy a single member of
the service

BDD infrastructure
provisioned
passing
the service
• Then

BDD infrastructure
provisioned
passing
the service
• Then – we expect all monitoring at the service
level to be passing

BDD infrastructure
provisioned
passing
the service
level to be passing
• And

BDD infrastructure
provisioned
passing
the service
level to be passing
• And – we expect all monitoring at the single
machine level to be failing
Yes, I am suggesting regression testing your load balancer setup...

Is this for real?
•Yes!
• We actually built this, the core parts are on
github

Is this for real?
•Yes!
• We actually built this, the core parts are on
github
• Deployed real applications to production at
TIM Group

environments
•Continuous integration and
deployment for infrastructure
This is my promised land!

Questions?
• https://devblog.timgroup.com/2013/06/14/
exported-resources-considered-harmful/
• https://devblog.timgroup.com/2013/06/26/
scenario-testing-infrastructures/
• https://github.com/youdevise/provisioning-
tools
• https://github.com/youdevise/stackbuilder

Test driven infrastructure development (2 - puppetconf 2013 edition)

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Test driven infrastructure development (2 - puppetconf 2013 edition) (20)

More from Tomas Doran (20)

Recently uploaded (20)

Test driven infrastructure development (2 - puppetconf 2013 edition)