SlideShare a Scribd company logo
DISQUS
                           Continuous Deployment Everything



                                      David Cramer
                                         @zeeg




Wednesday, June 22, 2011
Continuous Deployment


          Shipping new code as soon
                 as it’s ready

                      (It’s really just super awesome buildbots)




Wednesday, June 22, 2011
Workflow


                           Commit (master)




                             Integration             Failed Build




                               Deploy                Reporting




                                                      Rollback




Wednesday, June 22, 2011
Pros                           Cons


              •     Develop features           •   Culture Shock
                    incrementally              •   Stability depends on
              •     Release frequently             test coverage
              •     Smaller doses of QA        •   Initial time
                                                   investment




                       We mostly just care about iteration and stability

Wednesday, June 22, 2011
Painless Development




Wednesday, June 22, 2011
Development



               •     Production > Staging > CI > Dev
                     •     Automate testing of complicated
                           processes and architecture
               •     Simple > complete
                     •     Especially for local development
               •     python setup.py {develop,test}
               •     Puppet, Chef, simple bootstrap.{py,sh}



Wednesday, June 22, 2011
Production            Staging
                           •    PostgreSQL   •   PostgreSQL
                           •    Memcache     •   Memcache
                           •    Redis        •   Redis
                           •    Solr         •   Solr
                           •    Apache       •   Apache
                           •    Nginx        •   Nginx
                           •    RabbitMQ     •   RabbitMQ


                               CI Server         Macbook

                           •    Memcache     •   PostgreSQL
                           •    PostgreSQL   •   Apache
                           •    Redis        •   Memcache
                           •    Solr         •   Redis
                           •    Apache       •   Solr
                           •    Nginx        •   Nginx
                           •    RabbitMQ     •   RabbitMQ


Wednesday, June 22, 2011
Bootstrapping Local



               •     Simplify local setup
                     •     git clone dcramer@disqus:disqus.git
                     •     ./bootstrap.sh
                     •     python manage.py runserver


               •     Need to test dependancies?
                     •     virtualbox + vagrant up



Wednesday, June 22, 2011
“Under Construction”



               •     Iterate quickly by hiding features
               •     Early adopters are free QA



                     from gargoyle import gargoyle

                     def my_view(request):
                         if gargoyle.is_active('awesome', request):
                             return 'new happy version :D'
                         else:
                             return 'old sad version :('




Wednesday, June 22, 2011
Gargoyle

                           Deploy features to portions of a user base at a
                            time to ensure smooth, measurable releases




                            Being users of our product, we actively use
                           early versions of features before public release

Wednesday, June 22, 2011
Conditions in Gargoyle


                    from gargoyle import gargoyle
                    from gargoyle.conditions import ModelConditionSet,
                                                    Percent, String

                    class UserConditionSet(ModelConditionSet):
                        # percent implicitly maps to ``id``
                        percent = Percent()
                        username = String()

                           def can_execute(self, instance):
                               return isinstance(instance, User)

                    # register with our main gargoyle instance
                    gargoyle.register(UserConditionSet(User))




Wednesday, June 22, 2011
Without Gargoyle


                    SWITCHES = {
                        # enable my_feature for 50%
                        'my_feature': range(0, 50),
                    }

                    def is_active(switch):
                        try:
                             pct_range = SWITCHES[switch]
                        except KeyError:
                             return False

                           ip_hash = sum([int(x) for x
                                          in ip_address.split('.')])

                           return (ip_hash % 100 in pct_range)


                                    If you use Django, use Gargoyle


Wednesday, June 22, 2011
Integration
                           (or as we like to call it)




Wednesday, June 22, 2011
Integration is Required




                           Deploy only when things wont break

Wednesday, June 22, 2011
Setup a Jenkins Build




Wednesday, June 22, 2011
Reporting is Critical




Wednesday, June 22, 2011
CI Requirements



               •     Developers must know when they’ve
                     broken something
                     •     IRC, Email, IM
               •     Support proper reporting
                     •     XUnit, Pylint, Coverage.py
               •     Painless setup
                     •     apt-get install jenkins *

                           https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+on+Ubuntu


Wednesday, June 22, 2011
Shortcomings

               •     False positives lower awareness
                     •     Reporting isn't accurate
                     •     Services fail
                     •     Bad Tests
               •     Not enough code coverage
                     •     Regressions on untested code
               •     Test suite takes too long
                     •     Integration tests vs Unit tests
                     •     SOA, distribution

Wednesday, June 22, 2011
Fixing False Positives




               •     Re-run tests several times on a failure
               •     Report continually failing tests
                     •     Fix continually failing tests
               •     Rely less on 3rd parties
                     •     Mock/Dingus




Wednesday, June 22, 2011
Maintaining Coverage




               •     Raise awareness with reporting
                     •     Fail/alert when coverage drops on a build
               •     Commit tests with code
                     •     Coverage against commit di    for
                           untested regressions
               •     Drive it into your culture




Wednesday, June 22, 2011
Speeding Up Tests




               •     Write true unit tests
                     •     vs slower integration tests
               •     Mock 3rd party APIs
               •     Distributed and parallel testing
                     •     http://github.com/disqus/mule




Wednesday, June 22, 2011
Mule



               •     Unstable, will change a lot
               •     Mostly Django right now
                     •     Generic interfaces for unittest2
               •     Works with multi-processing and Celery
               •     Full XUnit integration
               •     Simple workflow
                     •     mule test --runner="python manage.py
                           mule --worker $TEST"



Wednesday, June 22, 2011
Deploy (finally)




Wednesday, June 22, 2011
How DISQUS Does It




               •     Incremental deploy with Fabric
               •     Drop server from pool
               •     Pull in requirements on each server
                     •     Isolated virtualenv’s built on each server
               •     Push server back online




Wednesday, June 22, 2011
How You Can Do It

                    # fabfile.py
                    from fabric.api import *

                    def deploy(revision):
                        # update sources, virtualenv, requirements
                        # ...

                           # copy ``current`` to ``previous``
                           run('cp -R %(path)s/current %(path)s/previous' % dict(
                               path=env.path,
                               revision=revision,
                           ))

                           # symlink ``revision`` to ``current``
                           run('ln -fs %(path)s/%(revision)s %(path)s/current' % dict(
                               path=env.path,
                               revision=revision,
                           ))

                           # restart apache
                           run('touch %(path)s/current/django.wsgi')



Wednesday, June 22, 2011
How YOU Can Do It (cont.)




                    # fabfile.py
                    from fabric.api import *

                    def rollback(revision=None):
                        # move ``previous`` to ``current``
                        run('mv %(path)s/previous %(path)s/current' % dict(
                            path=env.path,
                            revision=revision,
                        ))

                           # restart apache
                           run('touch %(path)s/current/django.wsgi')




Wednesday, June 22, 2011
Challenges




               •     PyPi works on server A, but not B
               •     Scale
               •     CPU cost per server
               •     Schema changes, data model changes
               •     Backwards compatibility




Wednesday, June 22, 2011
PyPi is Down




               •     http://github.com/disqus/chishop




Wednesday, June 22, 2011
Help, we have 100 servers!




               •     Incremental (ours) vs Fanout
               •     Push vs Pull
                     •     Twitter uses BitTorrent
               •     Isolation vs Packaging (Complexity)




Wednesday, June 22, 2011
SQL Schema Changes




               1. Add column (NULLable)
               2. Add app code to fill column
               3.Deploy
               4.Backfill column
               5. Add app code to read column
               6.Deploy




Wednesday, June 22, 2011
Updating Caches




               •     Have a global version number
                     •     CACHE_PREFIX = 9000
               •     Have a data model cache version
                     •     sha1(cls.__dict__)
               •     Use multiple caches




Wednesday, June 22, 2011
Reporting




Wednesday, June 22, 2011
It’s Important!




Wednesday, June 22, 2011
<You> Why is mongodb-1 down?

         <Ops> It’s down? Must have crashed again




Wednesday, June 22, 2011
Meaningful Metrics




               •     Rate of tra c (not just hits!)
                     •     Business vs system
               •     Response time (database, web)
               •     Exceptions
               •     Social media
                     •     Twitter




Wednesday, June 22, 2011
Standard Tools



                                                       Nagios

                           Graphite




Wednesday, June 22, 2011
Using Graphite


                    # statsd.py
                    # requires python-statsd

                    from pystatsd import Client
                    import socket

                    def with_suffix(key):
                        hostname = socket.gethostname().split('.')[0]
                        return '%s.%s' % (key, hostname)

                    client = Client(host=STATSD_HOST, port=STATSD_PORT)

                    # statsd.incr('key1', 'key2')
                    def incr(*keys):
                        keys = [with_suffix(k) for k in keys]:
                        client.increment(*keys):




Wednesday, June 22, 2011
Using Graphite (cont.)




                           (Tra c across a cluster of servers)


Wednesday, June 22, 2011
Logging



                           •   Realtime
                           •   Aggregates
                           •   History
                           •   Notifications
                           •   Scalable
                           •   Available
                           •   Metadata



Wednesday, June 22, 2011
Logging: Syslog


                           ✓   Realtime
                           x   Aggregates
                           ✓   History
                           x   Notifications
                           ✓   Scalable
                           ✓   Available
                           x   Metadata




Wednesday, June 22, 2011
Logging: Email Collection


                               ✓   Realtime
                               x   Aggregates
                               ✓   History
                               x   Notifications
                               x   Scalable
                               ✓   Available
                               ✓   Metadata


                           (Django provides this out of the box)


Wednesday, June 22, 2011
Logging: Sentry


                                 ✓   Realtime
                                 ✓   Aggregates
                                 ✓   History
                                 ✓   Notifications
                                 ✓   Scalable
                                 ✓   Available
                                 ✓   Metadata


                           http://github.com/dcramer/django-sentry


Wednesday, June 22, 2011
Setting up Sentry (1.x)



                    # setup your server first
                    $ pip install django-sentry
                    $ sentry start

                    # configure your Python (Django in our case) client
                    INSTALLED_APPS = (
                        # ...
                        'sentry.client',
                    )

                    # point the client to the servers
                    SENTRY_REMOTE_URL = ['http://sentry/store/']

                    # visit http://sentry in the browser




Wednesday, June 22, 2011
Setting up Sentry (cont.)


                    # ~/.sentry/sentry.conf.py

                    # use a better database
                    DATABASES = {
                        'default': {
                            'ENGINE': 'postgresql_psycopg2',
                            'NAME': 'sentry',
                            'USER': 'postgres',
                        }
                    }

                    # bind to all interfaces
                    SENTRY_WEB_HOST = '0.0.0.0'

                    # change data paths
                    SENTRY_WEB_LOG_FILE = '/var/log/sentry.log'
                    SENTRY_WEB_PID_FILE = '/var/run/sentry.pid'


Wednesday, June 22, 2011
Sentry (demo time)




Wednesday, June 22, 2011
Wrap Up




Wednesday, June 22, 2011
Getting Started




               •     Package your app
               •     Ease deployment; fast rollbacks
               •     Setup automated tests
               •     Gather some easy metrics




Wednesday, June 22, 2011
Going Further




               •     Build an immune system
                     •     Automate deploys, rollbacks (maybe)
               •     Adjust to your culture
                     •     CD doesn’t “just work”
               •     SOA == great success




Wednesday, June 22, 2011
DISQUS
                             Questions?




                             psst, we’re hiring
                            jobs@disqus.com

Wednesday, June 22, 2011
References



               •     Gargoyle (feature switches)
                     https://github.com/disqus/gargoyle
               •     Sentry (log aggregation)
                     https://github.com/dcramer/django-sentry (1.x)
                     https://github.com/dcramer/sentry (2.x)
               •     Jenkins CI
                     http://jenkins-ci.org/
               •     Mule (distributed test runner)
                     https://github.com/disqus/mule




                                              code.disqus.com
Wednesday, June 22, 2011

More Related Content

Viewers also liked (20)

PDF
Code, ci, infrastructure - the gophers way
Alex Baitov
 
PPT
Agile Design - Chicago IXDA Presentation
Alice Toth
 
PPTX
The Hard Problems of Continuous Deployment
Timothy Fitz
 
PDF
Testing, CI and CD in the real world
Roc Boronat
 
PPTX
FALCON's Tilt Tray Sorter: A new age in Packet Sorting
Falcon Autotech
 
PPT
Cloud Application Development Lifecycle
Suhas Kelkar
 
PPTX
Lviv PMDay: Дов Німрац Як зробити процес Continuous Integration ефективним
Lviv Startup Club
 
PPTX
Training for Third Sector Partners
Paul McElvaney
 
PPT
Loraine Slinn at LP2010
Paul McElvaney
 
PPS
Stunning Photos
JennAlm
 
PPTX
ステルスマーケティングとニュースリリース
Minako Kambara
 
PPS
Niver Bah - 22.06.07
Jubrac Jacui
 
PPT
香港六合彩
wejia
 
PPS
01.2008 AcampãO
Jubrac Jacui
 
PPT
Learning Pool Webinar: Brand new new authoring tool templates
Paul McElvaney
 
PDF
Webanalytics2.0 sem jvol2
Sonika Mishra
 
PDF
Pondres Social Marketing event 26 oktober
Sjef Kerkhofs
 
PPS
Para Que Serve O Galego
galiciaprofunda
 
PDF
Inbox Zero
melvinramos
 
Code, ci, infrastructure - the gophers way
Alex Baitov
 
Agile Design - Chicago IXDA Presentation
Alice Toth
 
The Hard Problems of Continuous Deployment
Timothy Fitz
 
Testing, CI and CD in the real world
Roc Boronat
 
FALCON's Tilt Tray Sorter: A new age in Packet Sorting
Falcon Autotech
 
Cloud Application Development Lifecycle
Suhas Kelkar
 
Lviv PMDay: Дов Німрац Як зробити процес Continuous Integration ефективним
Lviv Startup Club
 
Training for Third Sector Partners
Paul McElvaney
 
Loraine Slinn at LP2010
Paul McElvaney
 
Stunning Photos
JennAlm
 
ステルスマーケティングとニュースリリース
Minako Kambara
 
Niver Bah - 22.06.07
Jubrac Jacui
 
香港六合彩
wejia
 
01.2008 AcampãO
Jubrac Jacui
 
Learning Pool Webinar: Brand new new authoring tool templates
Paul McElvaney
 
Webanalytics2.0 sem jvol2
Sonika Mishra
 
Pondres Social Marketing event 26 oktober
Sjef Kerkhofs
 
Para Que Serve O Galego
galiciaprofunda
 
Inbox Zero
melvinramos
 

Similar to Pitfalls of Continuous Deployment (20)

PDF
Continuous Deployment at Disqus (Pylons Minicon)
zeeg
 
PDF
Practicing Continuous Deployment
zeeg
 
PDF
Building Scalable Web Apps
zeeg
 
PDF
DevOps Introduction @Cegeka
dieterdm
 
PDF
PyCon 2011 Scaling Disqus
zeeg
 
PDF
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
ikailan
 
PDF
Towards Continuous Deployment with Django
Roger Barnes
 
PPTX
Anatomy of a Build Pipeline
Samuel Brown
 
PDF
Automation - fabric, django and more
Ilian Iliev
 
PDF
Django Bootstrapping with Ease
Concentric Sky
 
PDF
2011 june-kuala-lumpur-gtug-hackathon
ikailan
 
PDF
20110903 candycane
Yusuke Ando
 
PDF
Taming the Deployment Beast
Chris Cornutt
 
PDF
Deployment Tactics
Ian Barber
 
PDF
Jeff Lindsay: Building Public Infrastructure with Autosustainable Services
it-people
 
PDF
John adams talk cloudy
John Adams
 
PDF
Ruby and Rails, as secret weapon to build your service-oriented apps
Felipe Talavera
 
PDF
Clearly, I Have Made Some Bad Decisions
Jonathan Hitchcock
 
PDF
Continuous Deployment: The Dirty Details
Mike Brittain
 
PDF
AppScale Talk at SBonRails
Chris Bunch
 
Continuous Deployment at Disqus (Pylons Minicon)
zeeg
 
Practicing Continuous Deployment
zeeg
 
Building Scalable Web Apps
zeeg
 
DevOps Introduction @Cegeka
dieterdm
 
PyCon 2011 Scaling Disqus
zeeg
 
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
ikailan
 
Towards Continuous Deployment with Django
Roger Barnes
 
Anatomy of a Build Pipeline
Samuel Brown
 
Automation - fabric, django and more
Ilian Iliev
 
Django Bootstrapping with Ease
Concentric Sky
 
2011 june-kuala-lumpur-gtug-hackathon
ikailan
 
20110903 candycane
Yusuke Ando
 
Taming the Deployment Beast
Chris Cornutt
 
Deployment Tactics
Ian Barber
 
Jeff Lindsay: Building Public Infrastructure with Autosustainable Services
it-people
 
John adams talk cloudy
John Adams
 
Ruby and Rails, as secret weapon to build your service-oriented apps
Felipe Talavera
 
Clearly, I Have Made Some Bad Decisions
Jonathan Hitchcock
 
Continuous Deployment: The Dirty Details
Mike Brittain
 
AppScale Talk at SBonRails
Chris Bunch
 
Ad

Recently uploaded (20)

PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Ad

Pitfalls of Continuous Deployment

  • 1. DISQUS Continuous Deployment Everything David Cramer @zeeg Wednesday, June 22, 2011
  • 2. Continuous Deployment Shipping new code as soon as it’s ready (It’s really just super awesome buildbots) Wednesday, June 22, 2011
  • 3. Workflow Commit (master) Integration Failed Build Deploy Reporting Rollback Wednesday, June 22, 2011
  • 4. Pros Cons • Develop features • Culture Shock incrementally • Stability depends on • Release frequently test coverage • Smaller doses of QA • Initial time investment We mostly just care about iteration and stability Wednesday, June 22, 2011
  • 6. Development • Production > Staging > CI > Dev • Automate testing of complicated processes and architecture • Simple > complete • Especially for local development • python setup.py {develop,test} • Puppet, Chef, simple bootstrap.{py,sh} Wednesday, June 22, 2011
  • 7. Production Staging • PostgreSQL • PostgreSQL • Memcache • Memcache • Redis • Redis • Solr • Solr • Apache • Apache • Nginx • Nginx • RabbitMQ • RabbitMQ CI Server Macbook • Memcache • PostgreSQL • PostgreSQL • Apache • Redis • Memcache • Solr • Redis • Apache • Solr • Nginx • Nginx • RabbitMQ • RabbitMQ Wednesday, June 22, 2011
  • 8. Bootstrapping Local • Simplify local setup • git clone dcramer@disqus:disqus.git • ./bootstrap.sh • python manage.py runserver • Need to test dependancies? • virtualbox + vagrant up Wednesday, June 22, 2011
  • 9. “Under Construction” • Iterate quickly by hiding features • Early adopters are free QA from gargoyle import gargoyle def my_view(request): if gargoyle.is_active('awesome', request): return 'new happy version :D' else: return 'old sad version :(' Wednesday, June 22, 2011
  • 10. Gargoyle Deploy features to portions of a user base at a time to ensure smooth, measurable releases Being users of our product, we actively use early versions of features before public release Wednesday, June 22, 2011
  • 11. Conditions in Gargoyle from gargoyle import gargoyle from gargoyle.conditions import ModelConditionSet, Percent, String class UserConditionSet(ModelConditionSet): # percent implicitly maps to ``id`` percent = Percent() username = String() def can_execute(self, instance): return isinstance(instance, User) # register with our main gargoyle instance gargoyle.register(UserConditionSet(User)) Wednesday, June 22, 2011
  • 12. Without Gargoyle SWITCHES = { # enable my_feature for 50% 'my_feature': range(0, 50), } def is_active(switch): try: pct_range = SWITCHES[switch] except KeyError: return False ip_hash = sum([int(x) for x in ip_address.split('.')]) return (ip_hash % 100 in pct_range) If you use Django, use Gargoyle Wednesday, June 22, 2011
  • 13. Integration (or as we like to call it) Wednesday, June 22, 2011
  • 14. Integration is Required Deploy only when things wont break Wednesday, June 22, 2011
  • 15. Setup a Jenkins Build Wednesday, June 22, 2011
  • 17. CI Requirements • Developers must know when they’ve broken something • IRC, Email, IM • Support proper reporting • XUnit, Pylint, Coverage.py • Painless setup • apt-get install jenkins * https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+on+Ubuntu Wednesday, June 22, 2011
  • 18. Shortcomings • False positives lower awareness • Reporting isn't accurate • Services fail • Bad Tests • Not enough code coverage • Regressions on untested code • Test suite takes too long • Integration tests vs Unit tests • SOA, distribution Wednesday, June 22, 2011
  • 19. Fixing False Positives • Re-run tests several times on a failure • Report continually failing tests • Fix continually failing tests • Rely less on 3rd parties • Mock/Dingus Wednesday, June 22, 2011
  • 20. Maintaining Coverage • Raise awareness with reporting • Fail/alert when coverage drops on a build • Commit tests with code • Coverage against commit di for untested regressions • Drive it into your culture Wednesday, June 22, 2011
  • 21. Speeding Up Tests • Write true unit tests • vs slower integration tests • Mock 3rd party APIs • Distributed and parallel testing • http://github.com/disqus/mule Wednesday, June 22, 2011
  • 22. Mule • Unstable, will change a lot • Mostly Django right now • Generic interfaces for unittest2 • Works with multi-processing and Celery • Full XUnit integration • Simple workflow • mule test --runner="python manage.py mule --worker $TEST" Wednesday, June 22, 2011
  • 24. How DISQUS Does It • Incremental deploy with Fabric • Drop server from pool • Pull in requirements on each server • Isolated virtualenv’s built on each server • Push server back online Wednesday, June 22, 2011
  • 25. How You Can Do It # fabfile.py from fabric.api import * def deploy(revision): # update sources, virtualenv, requirements # ... # copy ``current`` to ``previous`` run('cp -R %(path)s/current %(path)s/previous' % dict( path=env.path, revision=revision, )) # symlink ``revision`` to ``current`` run('ln -fs %(path)s/%(revision)s %(path)s/current' % dict( path=env.path, revision=revision, )) # restart apache run('touch %(path)s/current/django.wsgi') Wednesday, June 22, 2011
  • 26. How YOU Can Do It (cont.) # fabfile.py from fabric.api import * def rollback(revision=None): # move ``previous`` to ``current`` run('mv %(path)s/previous %(path)s/current' % dict( path=env.path, revision=revision, )) # restart apache run('touch %(path)s/current/django.wsgi') Wednesday, June 22, 2011
  • 27. Challenges • PyPi works on server A, but not B • Scale • CPU cost per server • Schema changes, data model changes • Backwards compatibility Wednesday, June 22, 2011
  • 28. PyPi is Down • http://github.com/disqus/chishop Wednesday, June 22, 2011
  • 29. Help, we have 100 servers! • Incremental (ours) vs Fanout • Push vs Pull • Twitter uses BitTorrent • Isolation vs Packaging (Complexity) Wednesday, June 22, 2011
  • 30. SQL Schema Changes 1. Add column (NULLable) 2. Add app code to fill column 3.Deploy 4.Backfill column 5. Add app code to read column 6.Deploy Wednesday, June 22, 2011
  • 31. Updating Caches • Have a global version number • CACHE_PREFIX = 9000 • Have a data model cache version • sha1(cls.__dict__) • Use multiple caches Wednesday, June 22, 2011
  • 34. <You> Why is mongodb-1 down? <Ops> It’s down? Must have crashed again Wednesday, June 22, 2011
  • 35. Meaningful Metrics • Rate of tra c (not just hits!) • Business vs system • Response time (database, web) • Exceptions • Social media • Twitter Wednesday, June 22, 2011
  • 36. Standard Tools Nagios Graphite Wednesday, June 22, 2011
  • 37. Using Graphite # statsd.py # requires python-statsd from pystatsd import Client import socket def with_suffix(key): hostname = socket.gethostname().split('.')[0] return '%s.%s' % (key, hostname) client = Client(host=STATSD_HOST, port=STATSD_PORT) # statsd.incr('key1', 'key2') def incr(*keys): keys = [with_suffix(k) for k in keys]: client.increment(*keys): Wednesday, June 22, 2011
  • 38. Using Graphite (cont.) (Tra c across a cluster of servers) Wednesday, June 22, 2011
  • 39. Logging • Realtime • Aggregates • History • Notifications • Scalable • Available • Metadata Wednesday, June 22, 2011
  • 40. Logging: Syslog ✓ Realtime x Aggregates ✓ History x Notifications ✓ Scalable ✓ Available x Metadata Wednesday, June 22, 2011
  • 41. Logging: Email Collection ✓ Realtime x Aggregates ✓ History x Notifications x Scalable ✓ Available ✓ Metadata (Django provides this out of the box) Wednesday, June 22, 2011
  • 42. Logging: Sentry ✓ Realtime ✓ Aggregates ✓ History ✓ Notifications ✓ Scalable ✓ Available ✓ Metadata http://github.com/dcramer/django-sentry Wednesday, June 22, 2011
  • 43. Setting up Sentry (1.x) # setup your server first $ pip install django-sentry $ sentry start # configure your Python (Django in our case) client INSTALLED_APPS = ( # ... 'sentry.client', ) # point the client to the servers SENTRY_REMOTE_URL = ['http://sentry/store/'] # visit http://sentry in the browser Wednesday, June 22, 2011
  • 44. Setting up Sentry (cont.) # ~/.sentry/sentry.conf.py # use a better database DATABASES = { 'default': { 'ENGINE': 'postgresql_psycopg2', 'NAME': 'sentry', 'USER': 'postgres', } } # bind to all interfaces SENTRY_WEB_HOST = '0.0.0.0' # change data paths SENTRY_WEB_LOG_FILE = '/var/log/sentry.log' SENTRY_WEB_PID_FILE = '/var/run/sentry.pid' Wednesday, June 22, 2011
  • 47. Getting Started • Package your app • Ease deployment; fast rollbacks • Setup automated tests • Gather some easy metrics Wednesday, June 22, 2011
  • 48. Going Further • Build an immune system • Automate deploys, rollbacks (maybe) • Adjust to your culture • CD doesn’t “just work” • SOA == great success Wednesday, June 22, 2011
  • 49. DISQUS Questions? psst, we’re hiring [email protected] Wednesday, June 22, 2011
  • 50. References • Gargoyle (feature switches) https://github.com/disqus/gargoyle • Sentry (log aggregation) https://github.com/dcramer/django-sentry (1.x) https://github.com/dcramer/sentry (2.x) • Jenkins CI http://jenkins-ci.org/ • Mule (distributed test runner) https://github.com/disqus/mule code.disqus.com Wednesday, June 22, 2011