Web frameworks don't matter
   Web frameworks don't matter




    Web frameworks don’t
           matter.
Some tips, tricks and patterns for designing, scaling and
       maintaining large scale web applications.

                                 Tomas (t0m) Doran

                                 Nordic perl Workshop 2010
                                 São Paulo.pm perl workshop 2010
Introduction
• t0m - Catalyst core team. Moose
  committer. 99 CPAN dists. Silly hair. Idiot ;)
• This is a rant talk about (web) application
  design.
• You get to listen to 3 hours of this today.
  (Sorry about that)
• Please stop me, ask questions, disagree etc.
  (It’ll be more fun, for all of us)
Application design

• Is hard!
• You won’t get it all right first time.
• The web parts are not the main parts
• Yes, even in a web application
Success
• You need to be flexible, requirements will
  change.
• You need to be flexible, your planned
  solutions won’t work
• You need to be flexible, the performance
  inflection points won’t be where you
  predict
Getting the data model right

   • Hint - You won’t..
   • Consistency is the key
   • KISS
   • No broken windows
Getting the data model right

   • You need a consistent and standalone
     model of your application
   • Routing URIs to actions isn’t a hard
     problem
   • Keeping the data from becoming a pile of
     shit is a hard problem
Loosely
              coupled

• You will throw parts of your codebase away
• Components can be tested / replaced
  independently
• Dependency injection
Dependency Injection
Dependency Injection
• Any of the components can be faked /
  mocked / replaced
• Testing becomes much easier (having a test
  MogileFS is a pain in the ass, really!)
• Write factories to build instances so
  dependency injection doesn’t cost.
• Bread::Board?
Efficiency

• Web stuff should be fast!
• Doing extra work on your web servers is
  bad news
• Lots of web processes probably not what
  you want
The PHP fallacy
 Even if extra context switching has zero overhead
       you serve people sooner if you queue.

    A     B       A   B     A     B       A   B

              A                       B


A finishes significantly before B in the lower diagram

        B finishes at the same time in both
Caching strategies

• Page fragments
• Objects and data structures
• Varnish/ESI, mod_cache/SSI
• memcached
• Materialized views (triggers!)
Simple web
 architecture
                                      Load balancer



                                Web server Web server
                                 Web server Web server
                                   Web server Web server
This is how you start and how
most people think about a web
          architecture
                                     Data store
Add a reverse
   proxy                                 Load balancer




                                        Reverse proxy



 Your users are not on a Gb/s      Web server Web server
             pipe.                  Web server Web server
                                      Web server Web server


  Don’t keep your expensive
application processes in IO wait
          sending bytes
                                        Data store
Cache
    expensive                          Load balancer

     lookups
                                      Reverse proxy



Memcache shown here, but this    Web server Web server      Memcache
                                  Web server Web server
   isn’t the only strategy          Web server Web server   Memcache


Materialized views in the data
store layer - denormalisation
      without the pain.
                                      Data store
Using a page
 assembly layer                             PAL




                                        Reverse proxy



                                    Web server Web server      Memcache

   Allows different chunks of        Web server Web server
                                       Web server Web server   Memcache
content to have different caching
           strategies.

Cache even authenticated users           Data store
          (mostly)
Complex
    web
architecture
                                                 PAL




                                             Reverse proxy



                                         Web server Web server      Memcache
Anything that is going to   Message
                                          Web server Web server
                                            Web server Web server   Memcache
   block is run as an        Queue


   asynchronous job
                            Job Server

                                              Data store
Cache stampede!

• Cache flushes can become terminal to your
  application.
• You need to ensure you can avoid this (or
  at least know it’s there)
• Implement switches to ‘cool down’ your
  app
Development,
deployment and testing
• You must use version control
• You must have version numbers
• You must have package management
• You must have a stage environment
Development,
deployment and testing

• You need tests!
• Even if you’re not doing TDD!
• The most likely bug to have is one you
  already fixed once.
Development,
deployment and testing

• Everyone in your team needs to be able to
  do everything
• No superstars, being run over by a bus
  more likely than you think
Development,
deployment and testing

• Test plans
• A-B testing
• Customer driven testing and deployment
Development,
deployment and testing

• You need a rollback plan
• API versioning
• Data migrations - be careful!!
The Zen swap

• Migrating continually updated data around
  with no downtime
• E.g. moving Zen machines between hosts,
  moving database info around.
• For database, relies on triggers
The Zen swap

• Modify time column on ‘from’ table.
• Copy (and munge) all the data, row by row,
  noting when you start
• Copy (and munge) all the data changed
  since you started
• Repeat until the set of dirty data is very
  small
The Zen swap
Old New   Old New   Old New   Old New
The Zen swap
• Stop the universe
• Do one final pass of the data
• Add triggers for reads/writes in the old
  column to use the new column
• Start the universe
• Needs transactional DDL to be seamless
• Even without, use to minimise downtime
Accidental complexity
• Premature generalisation is the root of all
  evil.
• Lack of polymorphism
• Insufficient use of delegation
• Commonality develops independently.You
  MUST refactor
Load and scalability
         testing
• Simple ab can tell you a lot
• NYTProf
• You need to test your system with your
  hardware and your data
• A little tuning can go a long way.
Health monitoring
• Log useful stuff from your app!
• Syslog
• Nagios
• Healthcheck pages
• Munin
• Splunk
Performance
          monitoring

• Per hit stats (db queries, memcache hits,
  times taken)
• Query comments
• Graphs are awesome (RRD is kinda hateful)
Conclusion

• Components rule!
• Expect change.
• Enforce consistency.
Thanks!

• Slides will be on slideshare later
• I can (and probably will) rant at length
  about any of this if you buy me a beer ;)
• Questions?

Web frameworks don't matter

  • 1.
    Web frameworks don'tmatter Web frameworks don't matter Web frameworks don’t matter. Some tips, tricks and patterns for designing, scaling and maintaining large scale web applications. Tomas (t0m) Doran Nordic perl Workshop 2010 São Paulo.pm perl workshop 2010
  • 2.
    Introduction • t0m -Catalyst core team. Moose committer. 99 CPAN dists. Silly hair. Idiot ;) • This is a rant talk about (web) application design. • You get to listen to 3 hours of this today. (Sorry about that) • Please stop me, ask questions, disagree etc. (It’ll be more fun, for all of us)
  • 3.
    Application design • Ishard! • You won’t get it all right first time. • The web parts are not the main parts • Yes, even in a web application
  • 4.
    Success • You needto be flexible, requirements will change. • You need to be flexible, your planned solutions won’t work • You need to be flexible, the performance inflection points won’t be where you predict
  • 5.
    Getting the datamodel right • Hint - You won’t.. • Consistency is the key • KISS • No broken windows
  • 6.
    Getting the datamodel right • You need a consistent and standalone model of your application • Routing URIs to actions isn’t a hard problem • Keeping the data from becoming a pile of shit is a hard problem
  • 7.
    Loosely coupled • You will throw parts of your codebase away • Components can be tested / replaced independently • Dependency injection
  • 8.
  • 9.
    Dependency Injection • Anyof the components can be faked / mocked / replaced • Testing becomes much easier (having a test MogileFS is a pain in the ass, really!) • Write factories to build instances so dependency injection doesn’t cost. • Bread::Board?
  • 10.
    Efficiency • Web stuffshould be fast! • Doing extra work on your web servers is bad news • Lots of web processes probably not what you want
  • 11.
    The PHP fallacy Even if extra context switching has zero overhead you serve people sooner if you queue. A B A B A B A B A B A finishes significantly before B in the lower diagram B finishes at the same time in both
  • 12.
    Caching strategies • Pagefragments • Objects and data structures • Varnish/ESI, mod_cache/SSI • memcached • Materialized views (triggers!)
  • 13.
    Simple web architecture Load balancer Web server Web server Web server Web server Web server Web server This is how you start and how most people think about a web architecture Data store
  • 14.
    Add a reverse proxy Load balancer Reverse proxy Your users are not on a Gb/s Web server Web server pipe. Web server Web server Web server Web server Don’t keep your expensive application processes in IO wait sending bytes Data store
  • 15.
    Cache expensive Load balancer lookups Reverse proxy Memcache shown here, but this Web server Web server Memcache Web server Web server isn’t the only strategy Web server Web server Memcache Materialized views in the data store layer - denormalisation without the pain. Data store
  • 16.
    Using a page assembly layer PAL Reverse proxy Web server Web server Memcache Allows different chunks of Web server Web server Web server Web server Memcache content to have different caching strategies. Cache even authenticated users Data store (mostly)
  • 17.
    Complex web architecture PAL Reverse proxy Web server Web server Memcache Anything that is going to Message Web server Web server Web server Web server Memcache block is run as an Queue asynchronous job Job Server Data store
  • 18.
    Cache stampede! • Cacheflushes can become terminal to your application. • You need to ensure you can avoid this (or at least know it’s there) • Implement switches to ‘cool down’ your app
  • 19.
    Development, deployment and testing •You must use version control • You must have version numbers • You must have package management • You must have a stage environment
  • 20.
    Development, deployment and testing •You need tests! • Even if you’re not doing TDD! • The most likely bug to have is one you already fixed once.
  • 21.
    Development, deployment and testing •Everyone in your team needs to be able to do everything • No superstars, being run over by a bus more likely than you think
  • 22.
    Development, deployment and testing •Test plans • A-B testing • Customer driven testing and deployment
  • 23.
    Development, deployment and testing •You need a rollback plan • API versioning • Data migrations - be careful!!
  • 24.
    The Zen swap •Migrating continually updated data around with no downtime • E.g. moving Zen machines between hosts, moving database info around. • For database, relies on triggers
  • 25.
    The Zen swap •Modify time column on ‘from’ table. • Copy (and munge) all the data, row by row, noting when you start • Copy (and munge) all the data changed since you started • Repeat until the set of dirty data is very small
  • 26.
    The Zen swap OldNew Old New Old New Old New
  • 27.
    The Zen swap •Stop the universe • Do one final pass of the data • Add triggers for reads/writes in the old column to use the new column • Start the universe • Needs transactional DDL to be seamless • Even without, use to minimise downtime
  • 28.
    Accidental complexity • Prematuregeneralisation is the root of all evil. • Lack of polymorphism • Insufficient use of delegation • Commonality develops independently.You MUST refactor
  • 29.
    Load and scalability testing • Simple ab can tell you a lot • NYTProf • You need to test your system with your hardware and your data • A little tuning can go a long way.
  • 30.
    Health monitoring • Loguseful stuff from your app! • Syslog • Nagios • Healthcheck pages • Munin • Splunk
  • 31.
    Performance monitoring • Per hit stats (db queries, memcache hits, times taken) • Query comments • Graphs are awesome (RRD is kinda hateful)
  • 32.
    Conclusion • Components rule! •Expect change. • Enforce consistency.
  • 33.
    Thanks! • Slides willbe on slideshare later • I can (and probably will) rant at length about any of this if you buy me a beer ;) • Questions?