10 deploys per day
Dev & ops cooperation at Flickr


John Allspaw & Paul Hammond
         Velocity 2009
3 billion photos

               40,000 photos per second




                     http://flickr.com/photos/jimmyroq/415506736/
Dev versus Ops
“It’s not my machines,
     it’s your code!”
“It’s not my code,
it’s your machines!”
Spock Scotty
        Little bit weird   Pulls levers & turns knobs
Sits closer to the boss    Easily excited
       Thinks too hard     Yells a lot in emergencies
Says “No” all the time
Afraid that new fangled things will break the site
                  Fingerpointy
Ops stereotype

            Because the site breaks
                unexpectedly


                                      Because no one tells
                                        them anything
        Because
They say “NO” all the time
http://www.flickr.com/photos/stewart/461099066/




Traditional thinking

     Dev’s job is to add new features
Ops’ job is to keep the site stable and fast
Ops’ job is NOT to keep the site stable and fast
Ops’ job is to enable the business
           (this is dev’s job too)
The business requires change
But change is the root cause of most outages!
Discourage change in the interests of stability
                    or
Allow change to happen as often as it needs to
Lowering risk of change
through tools and culture
Dev and Ops
Ops who think like devs
Devs who think like ops
“But that’s me!”
You can always think more like them
Tools
1. Automated infrastructure
       If there is only one thing you do…
CFengine
Chef
       BCfg2                                  FAI
1. Automated infrastructure
         If there is only one thing you do…

        System Imager
Puppet                                        Cobbler
Role &
configuration
management

OS imaging
2. Shared version control
Everyone knows where to look
           http://www.flickr.com/photos/thunderchild5/1330744559/
3. One step build
3. One step build
   and deploy
[2009-06-22 16:03:57] [harmes] site deployed (changes...)




        Who? When? What?
Small frequent changes
              http://www.flickr.com/photos/mauren/2429240906/
4. Feature flags
(aka branching in code)
1.0.1         1.0.2



1.0   1.1           1.2


                          1.1.1


Desktop software
r2301   r2302   r2306




   Web software
http://www.flickr.com/photos/8720628@N04/2188922076/




Always ship trunk
Everyone knows exactly where to look
              http://www.flickr.com/photos/thunderchild5/1330744559/
Feature flags

#php
if ($cfg['enable_feature_video']){

 
 …
}

{* smarty *}
{if $cfg.enable_feature_beehive}

 
 …
{/if}
http://www.flickr.com/photos/healthserviceglasses/3522809727/




                     Private betas
Bucket testing




http://www.flickr.com/photos/davidw/2063575447/
http://www.flickr.com/photos/jking89/3031204314/




         Dark launches
Free
contingency
switches
              http://www.flickr.com/photos/flattop341/260207875/
5. Shared metrics
Application level metrics
Application level metrics
Adaptive feedback loops




         RU ok?
App                System Metrics
          maybe?
6. IRC and IM robots
Dev, Ops, and Robots
              Having a conversation

      build
                       deploy
      logs
                        logs           alerts
                                      monitors

IRC


                                       search
                                       engine
Culture
1. Respect
If there is only one thing you do…
Don’t
 stereotype
 (not all developers are lazy)



http://www.flickr.com/photos/aaronjacobs/64368770/
http://www.flickr.com/photos/chrisdag/2286198568/




         Respect other people’s expertise,
             opinions and responsibilities
http://www.flickr.com/photos/jwheare/2580631103/




 Don’t just say “No”
http://www.flickr.com/photos/alancleaver/2661424637/



                                                Don’t hide things
Developers: Talk to ops about the impact of your code:

• what metrics will change, and how?
• what are the risks?
• what are the signs that something is going wrong?
• what are the contingencies?
This means you need to work this out before talking to ops
2. Trust
Ops needs to trust dev to involve
them on feature discussions

Dev needs to trust ops to discuss
infrastructure changes

Everyone needs to trust that everyone else
is doing their best for the business




                         http://www.flickr.com/photos/85128884@N00/2650981813/
http://www.flickr.com/photos/flattop341/224176602/




   Shared runbooks & escalation plans
http://www.flickr.com/photos/telstar/2861103147/




  Provide knobs and levers
http://www.flickr.com/photos/williamhook/3468484351/




  Ops: Be transparent,
  give devs access to systems
3. Healthy attitude
   about failure
http://www.flickr.com/photos/pinksherbet/447190603/




Failure will happen
If you think you can prevent failure then
         you aren’t developing your ability to respond




http://www.flickr.com/photos/toms/2323779363/
http://www.flickr.com/photos/changereality/2349538868/
Fire drills



http://www.flickr.com/photos/dnorman/2678090600
4. Avoiding Blame
No fingerpointing




http://www.flickr.com/photos/rocketjim54/2955889085/
Fingerpointyness

problem!!!
 argggh!                                                    fixed.



     freaking out, blaming, whining,    figuring it




                                                     fixing things
      not talking, covering hiding.        out
      finding fault    ass   hurt egos


                                                                    time
Being productive

problem!!!
 argggh!                  fixed.



      figuring it   fixing things   feeling move
         out                       guilty on with
                                            life


                                                    time
Developers: Remember that someone else will
  probably get woken up when your code breaks




http://www.flickr.com/photos/alex-s/353218851/
http://www.flickr.com/photos/allspaw/2819774755/




  Ops: provide
  constructive
  feedback on
  current aches
  and pains
1. Automated infrastructure
2. Shared version control
3. One step build and deploy
4. Feature flags
5. Shared metrics
6. IRC and IM robots

1. Respect
2. Trust
3. Healthy attitude about failure
4. Avoiding Blame
This is not easy
You could just carry on shouting at each other…
(Thank you)

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr