SlideShare a Scribd company logo
Just-In-Time Scalability: Agile Methods to Support Massive Growth
What is IMVU?  
Behind the scenes... IMVU is LAMP, plus... Perlbal Memcached Solr MogileFS plus... BuildBot eAccelerator Linux (Debian) memcached Nagios Perl Roundup rrd Subversion ADODB b2evolution Coppermine feed2js FreeTag Incutio XML-RPC jrcache JSON-PHP Magpie osCommerce phpBB Phorum SimpleTest Selenium Audiere Boost Cal3D  CFL NSIS Pixomatic Python pywin32 SCons wxPython
Before and After Architecture Before We started with a small site, a mess of open source, and a small team that didn't know much about scaling.   After We ended with a large site, a medium sized team, and an architecture that has scaled.   We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
Before and After Architecture (1/4) November
Before and After Architecture (2/4) December
Before and After Architecture (3/4) February
Before and After Architecture (4/4) May
Advanced planning vs. fast response “ Driving” Continuously figure out what is going to go wrong soon Quickly fix it, without breaking something else Get feedback along the way “ Rocket ship” Figure out in advance what is going to go wrong Build a plan that prevents those things from happening Execute your plan Get feedback when done
Questions to ask “ Driving” How do you know you will be able to fix the problem in time? How can you be sure you won't cause collateral damage? How can you be sure you won't code yourself into a corner? “ Rocket ship” Are you sure you know what is going to happen? Are you sure you can execute? Can you afford it? Do you need feedback?
Continuous Ship Deploy new software quickly At IMVU time from check-in to production = 20 minutes Tell a good change from a bad change (quickly) Revert a bad change quickly Work in small batches At IMVU, a large batch = 3 days worth of work Break large projects down into small batches Don't have the same problem twice – fix the root cause of each class of problems IMVU pushes code to production 20-30 times every day
Cluster Immune System What it looks like to ship one piece of code to production: Run tests locally (SimpleTest, Selenium) Everyone has a complete sandbox Continuous Integration Server (BuildBot) A ll tests must pass or “shut down the line” Automatic feedback if the team is going too fast Incremental deploy Monitor cluster and business metrics in real-time Reject changes that move metrics out-of-bounds Alerting & Predictive monitoring (Nagios) Monitor all metrics that stakeholders care about If any metric goes out-of-bounds, wake somebody up Use historical trends to predict acceptable bounds When customers see a failure: Fix the problem for customers Improve your defenses at each level
Case Study: Sharding Problem:  Spread write queries across multiple databases Solution:  Intercept and redirect queries based on SQL comments Move one table or sub-system at a time Our experience was one engineer horizontally partitions one table or small sub-system in one week New engineers figure this out in about 5 minutes db_query(“INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)"); db_query(" /*shard customer://$customer_id */ INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)"); Learning: cross shard joins & transactions aren’t required
Case Study: Caching Problem:  Cache frequently read data to memcached Solution:  Intercept and cache queries based on SQL comments db_query_cache( BUDDY_CACHE_TIME ,  "/*shard customer://$customer_id */ /*cache-class customer://$customer_id/buddies */ SELECT friend_id, buddy_order FROM customers_friends WHERE customers_id=$customer_id"); ----------------- db_query(“/*shard customer://$customer_id */ DELETE FROM customers_friends  WHERE customers_id = $customer_id AND friend_id = $friend_id”); db_flush_cacheclass("customer://$customer_id/buddies”); Learning: Flushing cache critical to users and performance When a customer spends $24.95, they want the benefits immediately Learning: Test the cache behavior for critical systems
Case Study: Steering Data Design Problem:  Improve database schemas and data design to meet scalability requirements without downtime Solution:  Measure to find the real problems (harder than it sounds) Migrate to new design that takes advantage of sharding and/or caching
Case Study: Steering Data Design
Case Study: Steering Data Design
Case Study: Steering Data Design Problem: You can’t bulk move large frequently accessed data Solution: Copy on read Use when you are read bound Reads check cache, new location, and copy to new location if missing Writes go to new location if data has been migrated, otherwise old Copy on write Use when you are write bound Reads check cache, new location, then old location Writes go to new location, copying to new location if missing Copy all Use when file system fills up Reads & writes go to new location, falling back to old location if missing Cron copies data a few records at a time
“ Thank You for Listening!”

More Related Content

Similar to Just In Time Scalability Agile Methods To Support Massive Growth Presentation (20)

PDF
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Long Nguyen
 
PDF
Scalable, good, cheap
Marc Cluet
 
PDF
Continuous Delivery: The Dirty Details
Mike Brittain
 
PPT
Apache Con 2008 Top 10 Mistakes
John Coggeshall
 
PPT
Top 30 Scalability Mistakes
John Coggeshall
 
KEY
Continuous Integration, the minimum viable product
Julian Simpson
 
PPT
scale_perf_best_practices
webuploader
 
PPT
Top 10 Scalability Mistakes
John Coggeshall
 
PDF
From dev to ops and beyond - getting it done
Edorian
 
PPT
Planning for-high-performance-web-application
Nguyễn Duy Nhân
 
PDF
Rails Conf Europe 2007 Notes
Ross Lawley
 
PPT
Planning For High Performance Web Application
Yue Tian
 
PPT
Scaling Web Apps P Falcone
jedt
 
KEY
Cloud Time
John Repko
 
PPTX
From delivering plugins to delivering "as a Service" - Atlassian connect 2017
Quentin Adam
 
PDF
An Infrastructure for Team Development - Gaylord Aulke
dpc
 
ODP
Building Scalable Development Environments
Shahar Evron
 
PDF
Isset Presentation @ EECI2009
Isset Internet Professionals
 
PPTX
Clustered PHP - DC PHP 2009
marcelesser
 
PDF
Advanced Deployment
Jonathan Weiss
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Long Nguyen
 
Scalable, good, cheap
Marc Cluet
 
Continuous Delivery: The Dirty Details
Mike Brittain
 
Apache Con 2008 Top 10 Mistakes
John Coggeshall
 
Top 30 Scalability Mistakes
John Coggeshall
 
Continuous Integration, the minimum viable product
Julian Simpson
 
scale_perf_best_practices
webuploader
 
Top 10 Scalability Mistakes
John Coggeshall
 
From dev to ops and beyond - getting it done
Edorian
 
Planning for-high-performance-web-application
Nguyễn Duy Nhân
 
Rails Conf Europe 2007 Notes
Ross Lawley
 
Planning For High Performance Web Application
Yue Tian
 
Scaling Web Apps P Falcone
jedt
 
Cloud Time
John Repko
 
From delivering plugins to delivering "as a Service" - Atlassian connect 2017
Quentin Adam
 
An Infrastructure for Team Development - Gaylord Aulke
dpc
 
Building Scalable Development Environments
Shahar Evron
 
Isset Presentation @ EECI2009
Isset Internet Professionals
 
Clustered PHP - DC PHP 2009
marcelesser
 
Advanced Deployment
Jonathan Weiss
 

More from Timothy Fitz (10)

PDF
Continuous Deployment: Beyond Continuous Delivery
Timothy Fitz
 
PPT
Gdc 2010 architecture final slideshare edition
Timothy Fitz
 
PPTX
Scaling Up Continuous Deployment
Timothy Fitz
 
PPTX
The Hard Problems of Continuous Deployment
Timothy Fitz
 
PPTX
Realtime web2012
Timothy Fitz
 
PPTX
Continuous Deployment
Timothy Fitz
 
PPTX
Continuous Deployment
Timothy Fitz
 
PPTX
Shdh
Timothy Fitz
 
KEY
Socket.io
Timothy Fitz
 
PPTX
Shdh
Timothy Fitz
 
Continuous Deployment: Beyond Continuous Delivery
Timothy Fitz
 
Gdc 2010 architecture final slideshare edition
Timothy Fitz
 
Scaling Up Continuous Deployment
Timothy Fitz
 
The Hard Problems of Continuous Deployment
Timothy Fitz
 
Realtime web2012
Timothy Fitz
 
Continuous Deployment
Timothy Fitz
 
Continuous Deployment
Timothy Fitz
 
Socket.io
Timothy Fitz
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Ad

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

  • 1. Just-In-Time Scalability: Agile Methods to Support Massive Growth
  • 3. Behind the scenes... IMVU is LAMP, plus... Perlbal Memcached Solr MogileFS plus... BuildBot eAccelerator Linux (Debian) memcached Nagios Perl Roundup rrd Subversion ADODB b2evolution Coppermine feed2js FreeTag Incutio XML-RPC jrcache JSON-PHP Magpie osCommerce phpBB Phorum SimpleTest Selenium Audiere Boost Cal3D  CFL NSIS Pixomatic Python pywin32 SCons wxPython
  • 4. Before and After Architecture Before We started with a small site, a mess of open source, and a small team that didn't know much about scaling.  After We ended with a large site, a medium sized team, and an architecture that has scaled.  We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
  • 5. Before and After Architecture (1/4) November
  • 6. Before and After Architecture (2/4) December
  • 7. Before and After Architecture (3/4) February
  • 8. Before and After Architecture (4/4) May
  • 9. Advanced planning vs. fast response “ Driving” Continuously figure out what is going to go wrong soon Quickly fix it, without breaking something else Get feedback along the way “ Rocket ship” Figure out in advance what is going to go wrong Build a plan that prevents those things from happening Execute your plan Get feedback when done
  • 10. Questions to ask “ Driving” How do you know you will be able to fix the problem in time? How can you be sure you won't cause collateral damage? How can you be sure you won't code yourself into a corner? “ Rocket ship” Are you sure you know what is going to happen? Are you sure you can execute? Can you afford it? Do you need feedback?
  • 11. Continuous Ship Deploy new software quickly At IMVU time from check-in to production = 20 minutes Tell a good change from a bad change (quickly) Revert a bad change quickly Work in small batches At IMVU, a large batch = 3 days worth of work Break large projects down into small batches Don't have the same problem twice – fix the root cause of each class of problems IMVU pushes code to production 20-30 times every day
  • 12. Cluster Immune System What it looks like to ship one piece of code to production: Run tests locally (SimpleTest, Selenium) Everyone has a complete sandbox Continuous Integration Server (BuildBot) A ll tests must pass or “shut down the line” Automatic feedback if the team is going too fast Incremental deploy Monitor cluster and business metrics in real-time Reject changes that move metrics out-of-bounds Alerting & Predictive monitoring (Nagios) Monitor all metrics that stakeholders care about If any metric goes out-of-bounds, wake somebody up Use historical trends to predict acceptable bounds When customers see a failure: Fix the problem for customers Improve your defenses at each level
  • 13. Case Study: Sharding Problem: Spread write queries across multiple databases Solution: Intercept and redirect queries based on SQL comments Move one table or sub-system at a time Our experience was one engineer horizontally partitions one table or small sub-system in one week New engineers figure this out in about 5 minutes db_query(“INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)"); db_query(" /*shard customer://$customer_id */ INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)"); Learning: cross shard joins & transactions aren’t required
  • 14. Case Study: Caching Problem: Cache frequently read data to memcached Solution: Intercept and cache queries based on SQL comments db_query_cache( BUDDY_CACHE_TIME , "/*shard customer://$customer_id */ /*cache-class customer://$customer_id/buddies */ SELECT friend_id, buddy_order FROM customers_friends WHERE customers_id=$customer_id"); ----------------- db_query(“/*shard customer://$customer_id */ DELETE FROM customers_friends WHERE customers_id = $customer_id AND friend_id = $friend_id”); db_flush_cacheclass("customer://$customer_id/buddies”); Learning: Flushing cache critical to users and performance When a customer spends $24.95, they want the benefits immediately Learning: Test the cache behavior for critical systems
  • 15. Case Study: Steering Data Design Problem: Improve database schemas and data design to meet scalability requirements without downtime Solution: Measure to find the real problems (harder than it sounds) Migrate to new design that takes advantage of sharding and/or caching
  • 16. Case Study: Steering Data Design
  • 17. Case Study: Steering Data Design
  • 18. Case Study: Steering Data Design Problem: You can’t bulk move large frequently accessed data Solution: Copy on read Use when you are read bound Reads check cache, new location, and copy to new location if missing Writes go to new location if data has been migrated, otherwise old Copy on write Use when you are write bound Reads check cache, new location, then old location Writes go to new location, copying to new location if missing Copy all Use when file system fills up Reads & writes go to new location, falling back to old location if missing Cron copies data a few records at a time
  • 19. “ Thank You for Listening!”

Editor's Notes

  • #2: We all aspire to have scalability problems. We are going to talk about when we had scalability problems and the approach we used to solve those problems in a “Just In Time” way. I’m Eric and this is Chris. We are from IMVU. IMVU is a site that has had the good fortune to get some traction in the market and have had to solve scalability problems. The two critical pieces of our approach are an agile methodology we call continuous ship and a defect prevention process we call the cluster immune system. We are going to talk about these pieces and then how we applied them to solve a couple scalability problems.