SlideShare a Scribd company logo
eBay’s Challenges and Lessons
from Growing an eCommerce Platform to Planet Scale



                           Randy Shoup
                    eBay Distinguished Architect




 HPTS 2009
 October 27, 2009
Challenges at Internet Scale


• eBay manages …
   – Over 89 million active users worldwide
   – 190 million items for sale in 50,000 categories
   – Over 8 billion URL requests per day


• … in a dynamic environment
   – Hundreds of new features per quarter
   – Roughly 10% of items are listed or ended every day


• … worldwide
   – In 39 countries and 10 languages
   – 24x7x365




• >70 billion read / write operations / day

                                                          © 2009 eBay Inc.
Architectural Lessons (round 1)


• 1. Partition Everything
   – Functional partitioning for processing (pools) and
     data (hosts)                                         User         Item      Transaction
   – Horizontal partitioning (“shards”) for data

                                                                 Product      Account   Feedback


• 2. Asynchrony Everywhere
   – Event-driven queues and pipelines
     (at-least-once delivery, order-agnostic)
   – Multicast messaging
     (SRM-inspired techniques for reliability)




• 3. Automate Everything
   – Adaptive configuration of components
   – Feedback loops and machine learning
                                                                                               © 2009 eBay Inc.
Architectural Lessons (round 1)


• 4. Remember Everything Fails
   – Extensive telemetry for failure detection
   – Graceful degradation of functionality


• 5. Embrace Inconsistency
   – Consistency is a spectrum
   – Each usecase trades off CAP properties
   – No distributed transactions
   – Minimize inconsistency through state machines
     and careful ordering of operations
   – Eventual consistency through asynchronous
     recovery or reconciliation




                                                     © 2009 eBay Inc.
Lesson 6: Expect (R)evolution


• Change is the Only Constant
   –   New entities and data elements
   –   Constant infrastructure evolution
   –   Regular data repartitioning and service migration
   –   Periodic large-scale architectural revolution

• Design for Extensibility
   – Flexible schemas
        •   Extensible interfaces (attributes, k-v pairs)
        •   Heterogeneous object storage
   – Pluggable processing
        •   Disparate systems communicate via events
        •   Within system, processing pipeline controlled by configuration

• Incremental System Change
   – Decompose every system change into incremental steps
                                                                                  A   A   B   B             B
   – Multiple versions and systems coexist
        •   Every change is a rolling upgrade; transitional states are the norm
        •   Version A -> A|B -> B|A -> Version B
   – Strict forward / backward compatibility for data and interfaces
   – Dual data processing and storage (“dual writes”)                                 A       B
                                                                                                  © 2009 eBay Inc.
Lesson 7: Dependencies Matter


• Minimize and Control Dependencies
   – Service topology constrained by dependencies
       •   Data center moves change latency characteristics (!)
   – Depend only on abstract interface and virtualized endpoint
   – Make QoS parameters (latency, throughput) explicit in SLA

• Consumer Responsibility
   – It is fundamentally the consumer’s responsibility to manage
     unavailability and SLA violations
   – (Un)availability is an inherently Leaky Abstraction
       •   1st Fallacy of Distributed Computing: “The network is reliable”
   – Recovery is typically use-case-specific
       •   Driven by criticality of the operation and the strength of
           dependency
   – Can abstract with standard patterns
       •   Sync or async failover, degraded function, sync or async error

• Monitor Dependencies Ruthlessly
   – Registries provide WISB but only monitoring provides WIRI
   – Invaluable for problem diagnosis and capacity provisioning


                                                                             © 2009 eBay Inc.
Lesson 8: Be Authoritative


• Authoritative Source (“System of Record”)
   – At any given time, every piece of (mission-critical) data has a
     System of Record
   – Authority can be explicitly transferred (failure, migration)
   – Typically transactional database

                                                                       Primary
• Non-authoritative Sources
   – Every other copy is derived / cached / replicated from
     System of Record
       •   Remote disaster replicas
       •   Search engine                                                         Search Grid

       •   Analytics
       •   Secondary keys
   – Relaxed consistency guarantees with respect to System of
     Record
   – Optimized for alternate access paths or QoS properties
   – Perfectly acceptable for most use-cases




                                                                                 © 2009 eBay Inc.
Lesson 9: Never Enough Data


• Collect Everything
   –   eBay processes 50TB of new, incremental data per day
   –   eBay analyzes 50PB of data per day
   –   Every historical item and purchase is online or nearline
   –   Requires large-scale distributed storage

• Example: System Monitoring
   – Failures at scale are difficult to diagnose and near-impossible
     to replicate
        •   Requires granular instrumentation of every operation
   – Stream processing for pattern detection and failure prediction
   – Historical data to identify optimization opportunities and
     inform capacity provisioning

• Example: Recommendations and Ranking
   – Collect user behavior in the clickstream
        •   Collect -> filter -> enrich -> aggregate -> store                                Historical
                                                                                               Data
   – Drive purchase recommendations                                                                            Analysis
                                                                   Clickstream   Site Data
   – Drive models that predict value of page view, module
     impression, pixel allocation
   – Predictions in the long tail require massive data
                                                                                                          © 2009 eBay Inc.
Lesson 10: Custom Infrastructure


• Right Tool for the Right Job
   – Need to maximize utilization of every resource
       •   Data (memory), processing (CPU), clock time (latency), power (!)
   – One size rarely fits all, particularly at scale
   – Compose from orthogonal, commodity components

• Example: Session and Personalization Cache
   – In-memory volatile KVSS on partitioned MySql Memory
     Engine
   – Async replication to partitioned backing store (Oracle)
   – State redistributed on node failure
   – Versioning, optimistic concurrency, and resolver pattern for
     conflicts

• Example: Metric Server
   – In-memory hierarchical lookup structure for static data
   – Shared infrastructure for multiple types of static data,
     partitioned horizontally
   – Index built offline from multiple data sources, updated
     periodically


                                                                              © 2009 eBay Inc.
Questions?


• Randy Shoup, eBay Distinguished Architect (rshoup@ebay.com)




                                                                © 2009 eBay Inc.

More Related Content

PPTX
Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Munish Gupta
 
PDF
Data Warehousing Infrastructure on Cloud
tdwiindia
 
PDF
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
NetApp
 
PPTX
Building a highly scalable and available cloud application
Noam Sheffer
 
PPTX
Architectural tricks to maximize memory bandwidth
Deepak Shankar
 
PPT
Gentle into to DataGrid technology and customer use cases
Billy Newport
 
PDF
Data Server Manager for DB2 for z/OS
Saghi Amirsoleymani
 
PDF
(ATS4-PLAT06) Considerations for sizing and deployment
BIOVIA
 
Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Munish Gupta
 
Data Warehousing Infrastructure on Cloud
tdwiindia
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
NetApp
 
Building a highly scalable and available cloud application
Noam Sheffer
 
Architectural tricks to maximize memory bandwidth
Deepak Shankar
 
Gentle into to DataGrid technology and customer use cases
Billy Newport
 
Data Server Manager for DB2 for z/OS
Saghi Amirsoleymani
 
(ATS4-PLAT06) Considerations for sizing and deployment
BIOVIA
 

What's hot (20)

PPTX
Incorporating Chargeback In Private Cloud
Lai Yoong Seng
 
PPT
Initial deck on WebSphere eXtreme Scale with WebSphere Commerce Server
Billy Newport
 
KEY
Virtualisation at Ringo
Jeremy Brown
 
PDF
Understanding IBM i HA Options
Precisely
 
PDF
IBM flash systems
Solv AS
 
DOC
RESUME.DOC
Joseph Giorgio
 
PDF
XIV Storage deck final
Joe Krotz
 
PDF
IBM InterConnect 2015 - IIB Effective Application Development
Andrew Coleman
 
PPT
Data Kinetics Products
sheena82
 
PDF
Improving The Economics of Mainframe SOA Enablement: Exploiting zIIP/zAAP Spe...
Mike Nelson
 
PPTX
Storage virtualization on storage devices
Shubham_Indrawat
 
PPT
informix Embeddability and Autonomics
John Miller
 
PDF
VMworld 2013: Next Generation Branch Office Designs
VMworld
 
PPTX
Oracle hard and soft parsing
Ishaan Guliani
 
PDF
FlashSystems 2016 update
Joe Krotz
 
PDF
Scalability Design Principles - Internal Session
Sachin Sancheti - Microsoft Azure Architect
 
PPT
Q2 Briefing Presentation
Kurt Carlsen
 
PPTX
Lecture 9 further permissions
Wiliam Ferraciolli
 
PDF
Dynamic and Elastic Scaling in IBM Streams V4.3
lisanl
 
Incorporating Chargeback In Private Cloud
Lai Yoong Seng
 
Initial deck on WebSphere eXtreme Scale with WebSphere Commerce Server
Billy Newport
 
Virtualisation at Ringo
Jeremy Brown
 
Understanding IBM i HA Options
Precisely
 
IBM flash systems
Solv AS
 
RESUME.DOC
Joseph Giorgio
 
XIV Storage deck final
Joe Krotz
 
IBM InterConnect 2015 - IIB Effective Application Development
Andrew Coleman
 
Data Kinetics Products
sheena82
 
Improving The Economics of Mainframe SOA Enablement: Exploiting zIIP/zAAP Spe...
Mike Nelson
 
Storage virtualization on storage devices
Shubham_Indrawat
 
informix Embeddability and Autonomics
John Miller
 
VMworld 2013: Next Generation Branch Office Designs
VMworld
 
Oracle hard and soft parsing
Ishaan Guliani
 
FlashSystems 2016 update
Joe Krotz
 
Scalability Design Principles - Internal Session
Sachin Sancheti - Microsoft Azure Architect
 
Q2 Briefing Presentation
Kurt Carlsen
 
Lecture 9 further permissions
Wiliam Ferraciolli
 
Dynamic and Elastic Scaling in IBM Streams V4.3
lisanl
 
Ad

Similar to eBay’s Challenges and Lessons (20)

PDF
Ebay架构原则
yiditushe
 
PDF
Qcon best practices for scaling websites
youzitang
 
PDF
E Bay Best Practices For Scaling Websites
George Ang
 
PDF
Randy Shoup eBays Architectural Principles
deimos
 
PPTX
Zero to ten million daily users in four weeks: sustainable speed is king
plumbee
 
PDF
ROMA User-Customizable NoSQL Database in Ruby
Rakuten Group, Inc.
 
PDF
Data oriented and Process oriented Strategies for Legacy Information Systems ...
IDES Editor
 
PDF
Pstrong Cybera 29 Sept 2008
Cybera Inc.
 
PDF
Keynote-Service Orientation – Why is it good for your business
WSO2
 
PPTX
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
PDF
Curated Computing
Dr. Jimmy Schwarzkopf
 
PPT
Software Evolution_Se lect3 btech
IIITA
 
PPT
Agility Matrix - Updated!
Clayton Costa
 
PPT
Geotech presentation 2012
Pradipta Sen
 
PDF
Ebay的自动化
yiditushe
 
PPT
Best Practices for Large-Scale Websites -- Lessons from eBay
Randy Shoup
 
PPTX
The ‘as-a-Service' Phenomenon: The Market is Changing, Are You?
Stanton Jones
 
PDF
The "as a-Service" Phenomenon
Information Services Group (ISG)
 
PDF
Business Technology Brief
King's College London
 
PDF
E Bay Sd Forum2006 11 29
guest7fe78
 
Ebay架构原则
yiditushe
 
Qcon best practices for scaling websites
youzitang
 
E Bay Best Practices For Scaling Websites
George Ang
 
Randy Shoup eBays Architectural Principles
deimos
 
Zero to ten million daily users in four weeks: sustainable speed is king
plumbee
 
ROMA User-Customizable NoSQL Database in Ruby
Rakuten Group, Inc.
 
Data oriented and Process oriented Strategies for Legacy Information Systems ...
IDES Editor
 
Pstrong Cybera 29 Sept 2008
Cybera Inc.
 
Keynote-Service Orientation – Why is it good for your business
WSO2
 
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
Curated Computing
Dr. Jimmy Schwarzkopf
 
Software Evolution_Se lect3 btech
IIITA
 
Agility Matrix - Updated!
Clayton Costa
 
Geotech presentation 2012
Pradipta Sen
 
Ebay的自动化
yiditushe
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Randy Shoup
 
The ‘as-a-Service' Phenomenon: The Market is Changing, Are You?
Stanton Jones
 
The "as a-Service" Phenomenon
Information Services Group (ISG)
 
Business Technology Brief
King's College London
 
E Bay Sd Forum2006 11 29
guest7fe78
 
Ad

Recently uploaded (20)

PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
The Future of Artificial Intelligence (AI)
Mukul
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Software Development Methodologies in 2025
KodekX
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

eBay’s Challenges and Lessons

  • 1. eBay’s Challenges and Lessons from Growing an eCommerce Platform to Planet Scale Randy Shoup eBay Distinguished Architect HPTS 2009 October 27, 2009
  • 2. Challenges at Internet Scale • eBay manages … – Over 89 million active users worldwide – 190 million items for sale in 50,000 categories – Over 8 billion URL requests per day • … in a dynamic environment – Hundreds of new features per quarter – Roughly 10% of items are listed or ended every day • … worldwide – In 39 countries and 10 languages – 24x7x365 • >70 billion read / write operations / day © 2009 eBay Inc.
  • 3. Architectural Lessons (round 1) • 1. Partition Everything – Functional partitioning for processing (pools) and data (hosts) User Item Transaction – Horizontal partitioning (“shards”) for data Product Account Feedback • 2. Asynchrony Everywhere – Event-driven queues and pipelines (at-least-once delivery, order-agnostic) – Multicast messaging (SRM-inspired techniques for reliability) • 3. Automate Everything – Adaptive configuration of components – Feedback loops and machine learning © 2009 eBay Inc.
  • 4. Architectural Lessons (round 1) • 4. Remember Everything Fails – Extensive telemetry for failure detection – Graceful degradation of functionality • 5. Embrace Inconsistency – Consistency is a spectrum – Each usecase trades off CAP properties – No distributed transactions – Minimize inconsistency through state machines and careful ordering of operations – Eventual consistency through asynchronous recovery or reconciliation © 2009 eBay Inc.
  • 5. Lesson 6: Expect (R)evolution • Change is the Only Constant – New entities and data elements – Constant infrastructure evolution – Regular data repartitioning and service migration – Periodic large-scale architectural revolution • Design for Extensibility – Flexible schemas • Extensible interfaces (attributes, k-v pairs) • Heterogeneous object storage – Pluggable processing • Disparate systems communicate via events • Within system, processing pipeline controlled by configuration • Incremental System Change – Decompose every system change into incremental steps A A B B B – Multiple versions and systems coexist • Every change is a rolling upgrade; transitional states are the norm • Version A -> A|B -> B|A -> Version B – Strict forward / backward compatibility for data and interfaces – Dual data processing and storage (“dual writes”) A B © 2009 eBay Inc.
  • 6. Lesson 7: Dependencies Matter • Minimize and Control Dependencies – Service topology constrained by dependencies • Data center moves change latency characteristics (!) – Depend only on abstract interface and virtualized endpoint – Make QoS parameters (latency, throughput) explicit in SLA • Consumer Responsibility – It is fundamentally the consumer’s responsibility to manage unavailability and SLA violations – (Un)availability is an inherently Leaky Abstraction • 1st Fallacy of Distributed Computing: “The network is reliable” – Recovery is typically use-case-specific • Driven by criticality of the operation and the strength of dependency – Can abstract with standard patterns • Sync or async failover, degraded function, sync or async error • Monitor Dependencies Ruthlessly – Registries provide WISB but only monitoring provides WIRI – Invaluable for problem diagnosis and capacity provisioning © 2009 eBay Inc.
  • 7. Lesson 8: Be Authoritative • Authoritative Source (“System of Record”) – At any given time, every piece of (mission-critical) data has a System of Record – Authority can be explicitly transferred (failure, migration) – Typically transactional database Primary • Non-authoritative Sources – Every other copy is derived / cached / replicated from System of Record • Remote disaster replicas • Search engine Search Grid • Analytics • Secondary keys – Relaxed consistency guarantees with respect to System of Record – Optimized for alternate access paths or QoS properties – Perfectly acceptable for most use-cases © 2009 eBay Inc.
  • 8. Lesson 9: Never Enough Data • Collect Everything – eBay processes 50TB of new, incremental data per day – eBay analyzes 50PB of data per day – Every historical item and purchase is online or nearline – Requires large-scale distributed storage • Example: System Monitoring – Failures at scale are difficult to diagnose and near-impossible to replicate • Requires granular instrumentation of every operation – Stream processing for pattern detection and failure prediction – Historical data to identify optimization opportunities and inform capacity provisioning • Example: Recommendations and Ranking – Collect user behavior in the clickstream • Collect -> filter -> enrich -> aggregate -> store Historical Data – Drive purchase recommendations Analysis Clickstream Site Data – Drive models that predict value of page view, module impression, pixel allocation – Predictions in the long tail require massive data © 2009 eBay Inc.
  • 9. Lesson 10: Custom Infrastructure • Right Tool for the Right Job – Need to maximize utilization of every resource • Data (memory), processing (CPU), clock time (latency), power (!) – One size rarely fits all, particularly at scale – Compose from orthogonal, commodity components • Example: Session and Personalization Cache – In-memory volatile KVSS on partitioned MySql Memory Engine – Async replication to partitioned backing store (Oracle) – State redistributed on node failure – Versioning, optimistic concurrency, and resolver pattern for conflicts • Example: Metric Server – In-memory hierarchical lookup structure for static data – Shared infrastructure for multiple types of static data, partitioned horizontally – Index built offline from multiple data sources, updated periodically © 2009 eBay Inc.
  • 10. Questions? • Randy Shoup, eBay Distinguished Architect ([email protected]) © 2009 eBay Inc.