01/16/2014

CASSANDRA AND
RIAK AT
BESTBUY.COM
WHO WE ARE
• Best Buy is the world’s largest multi-channel consumer
electronics retailer, with stores in the United States, Canada, China

and Mexico.
• We are the 10th largest online retailer in the United States
• More than 1.6 billion visitors come to our stores and BestBuy.com

each year.
• Our My Best Buy loyalty program is among the largest loyalty
programs of its kind, with more than 40 million active members.

• We provide customers with outstanding choice, unbiased advice
and unmatched support for the tech needs.
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
A UNIQUE CUSTOMER PROMISE
• THE LATEST DEVICES AND SERVICES, ALL IN ONE
PLACE

• IMPARTIAL & KNOWLEDGEABLE ADVICE
• COMPETITIVE PRICES
• THE ABILITY TO SHOP WHEN AND WHERE YOU
WANT
• SUPPORT FOR THE LIFE OF YOUR PRODUCTS

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
JOEL
CRABB

KANNAN
SWAMINATHAN

Chief Architect,

Director, Web

BestBuy.com

Architecture

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
NOSQL AT BESTBUY.COM
• Schema-less design
—Key–value systems
—Sparse column systems
—Non-relational data

• Distributed nature
—Eventual consistency
—Active-Active across clouds and datacenters
—High reliability
—Horizontal scaling
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
BESTBUY.COM

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
SCALABILITY GOALS
• Near-Infinite
• Bursts

• 7X traffic spikes

• Bursts > 50,000 rps
• #4 in eCommerce traffic during
2013 Holiday

Walmart
Best Buy

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
FLEXIBILITY
GOALS
Low cost of change
Fast concepts to site
Daily releases
Multiple versions

One day of work vs. two months
RELIABILITY GOALS
• 100% availability
• Zero defects

• Achieved 100% cloud
uptime during Holiday

• ~ 2s response times

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CLOUD ARCHITECTURE CONCEPTS
• Clouds fail; plan for it
—Multiple availability zones
—Multiple regions
—Multiple vendors

• Datacenter connections fail; plan for it
—Serve pages completely from cloud
—Browse-only fallback mode

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CLOUD ARCHITECTURE CONCEPTS
CDN: Global Traffic Manager

Browse Cloud

Browse Cloud

Vendor 1

Vendor 2

Best Buy Datacenter
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
RIAK

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
PRODUCT CATALOG - REQUIREMENTS
• Business Requirements
—Easily add new attributes to products
—Use existing product content feeding systems
—Provide enhanced product data to all of Best Buy

• Technical Requirements
—Scale to BestBuy.com’s needs
—One-way replication from DC to cloud

—Provide a Product Catalog API usable by Best Buy
applications internally (DC) and externally (cloud)

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
BROWSE CLOUD - ARCHITECTURE

Persistent Cache

Load Balancer
Web App

Web App

Service Aggregator

Product Data

Product Data

Datacenter

Legacy Services and Product Data

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
WHY RIAK FOR THE PRODUCT CATALOG
• Key-value Store
—Easily add new attributes
—Different attribute sets for different product categories

• Hub and Spoke Replication
—Multiple cloud instances can be out-of-synch for seconds or
minutes; this is OK
—Connection to Datacenter can be lost

• Resilience within all tiers
—Riak’s ring architecture allows single instances to fail with
little impact to the system
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
WHY RIAK FOR THE PRODUCT CATALOG
• Legacy Bridge
—Extract data from legacy system
—Expose to anyone who needs the data
—Extends decision to retire legacy system

• Input into Product Evolution
—Best Buy is stretching Riak in multiple areas
—Feedback on Search and Replication
—Direct connection to Riak engineers

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
PRODUCT CATALOG DEPLOYMENT
Cloud Replicant

Cloud Replicant

V1.2.1
Each Cloud
Replicant is a
self-contained
instance which
can intermittently
lose connectivity
to the Datacenter

Datacenter Master

V1.4.2

January 16, 2014

@Copyright 2014 – Best Buy, Inc. All rights reserved.
PRODUCT CATALOG - DATA FLOW
Cloud

Web
Application

Granular
Service

Aggregator
Service

NoSQL Product Data

Replication

Extraction

Datacenter
Legacy Product Data

NoSQL Product Data
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
RIAK LESSONS LEARNED
• Search eats up the CPU (Pre-Yokozuna)

Search Endpoint Removed

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
RIAK LESSONS LEARNED
• Multi-datacenter replication is hard at scale
• Object replication fails silently (v1.4.2)

1.5 hours for partial update

4000 objects/sec

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
DID THE PRODUCT CATALOG SUCCEED?
• Scale – generally < 5 ms response times in cloud
• Scope – provides APIs in cloud and DC

Single zone
1.5 hours response update
failure, for partial
times jump to 8-10
ms

4 ms @ 95%
4000 objects/sec

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CASSANDRA

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CUSTOMER GRAPH - REQUIREMENTS
• Business Requirements
—Create a single identity provider
—Build an adaptive model to support a 360 degree view of the
customer, customer segmentation and multi-channel
personalization
—Extensible framework to support federated identity interoperability
with external providers

• Technical Requirements
—Scale to BestBuy.com’s needs with a distributed service oriented

architecture
—Support risk based authentication and secure service interfaces for
consumers
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CUSTOMER GRAPH - ARCHITECTURE
CDN + GTM + WAF

Load Balancer

API

Load Balancer

API

Topology: NTS
RF: 3
Partitioner: Random

Data Center 1 / Region 1

API

API

Cassandra Ring

Data Center n / Region n
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CUSTOMER GRAPH - CONCEPTUAL

Create Connection:
POST /customers/Kannan/following/customers/Joel
Get all Kannan’s connections:
GET /customers/Kannan/connections
Get Joel’s “followers”:
GET customers/Joel/followers
Get customers connecting to a specific store:
GET /stores/{store id}/connecting/customer
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CUSTOMER GRAPH – CONCEPTUAL

Topics

Products

Blogs

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
WHY CASSANDRA?
• No single point of failure

• Delivers on Atomicity, Isolation & Durability
• Eventual Consistency
• Tunable Consistency for Reads vs. Writes
• Linear Scalability
• Querying a column slice or a range of row keys

• Data can have expiration set
• Reliable multiple data center replication
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
LESSONS LEARNED – DATA MODELING
Use composite column names; static composite types

Column names are stored physically sorted and
indexed
Store “values” as column names

Wide rows in conjunction with composite columns can
be used to build indices, but…
For larger data sets, distribute the columns among rows
A secondary index is best modeled as a separate column
family
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
LESSONS LEARNED – DATA MODELING
Do not store an entity as a single column blob
Cannot index and query on entity attributes
Updates to an entity attribute would require a read and
then a write

Mutate just the required columns on an entity row
Do not read and then write

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
LESSONS LEARNED - MONITORING
Heap Size and Use
 GC

Mutation Stage (Writes) & Read Stage (Reads)
 Active and Pending

AE Service, Stream & Message-Streaming-Pool Stages
 Especially during scheduled repair or rebuild (while adding nodes to a
new data center)
 Streaming requires keep-alive connections (watch out for firewalls
terminating idle established connections; update periodic tcp keep-alive
ping rate)

Compaction Stage & Compaction Count
 IO Wait, Limits - nofiles, nproc

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
IMPLEMENTATION STRATEGY
• Start with a project team
—Riak – Product Catalog
—Cassandra – Customer Graph

• Build expertise in development and operations
• As usage grows, create a Platform team
—Combine engineering into one team
—Project team can then focus on business features
—Focus on automation

January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
CONCLUSIONS
• Riak and Cassandra have both been successful
—Riak – No complete outages in two years
—Cassandra – Flawless in its first Holiday in 2013

• Differences
—Replication patterns are different
—Write and read treatment is different
—Deployment pattern are different

• High Availability
—Both have highly resilient architectures
—Both scale linearly
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.
January 16, 2014
@Copyright 2014 – Best Buy, Inc. All rights reserved.

More Related Content

PDF
Cassandra at eBay - Cassandra Summit 2012
PPTX
Cassandra & puppet, scaling data at $15 per month
PDF
Apache Cassandra at Macys
PPTX
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
PDF
DevOps for Databricks
PPTX
Securing Hadoop with Apache Ranger
PDF
Intro to Delta Lake
PDF
Oracle Database Migration to Oracle Cloud Infrastructure
Cassandra at eBay - Cassandra Summit 2012
Cassandra & puppet, scaling data at $15 per month
Apache Cassandra at Macys
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍...
DevOps for Databricks
Securing Hadoop with Apache Ranger
Intro to Delta Lake
Oracle Database Migration to Oracle Cloud Infrastructure

What's hot (20)

PDF
Cassandra Introduction & Features
PPTX
Apache Arrow: In Theory, In Practice
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
ksqlDB: A Stream-Relational Database System
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
Kafka 101
PDF
Building Data Quality Audit Framework using Delta Lake at Cerner
PDF
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
PDF
Observability for Data Pipelines With OpenLineage
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
Kafka Streams State Stores Being Persistent
PPTX
Frame - Feature Management for Productive Machine Learning
PDF
Solving PostgreSQL wicked problems
PPTX
Kafka Connect - debezium
PDF
Can Apache Kafka Replace a Database?
PDF
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
A Hands-on Introduction on Terraform Best Concepts and Best Practices
Cassandra Introduction & Features
Apache Arrow: In Theory, In Practice
Flexible and Real-Time Stream Processing with Apache Flink
ksqlDB: A Stream-Relational Database System
Building robust CDC pipeline with Apache Hudi and Debezium
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Kafka 101
Building Data Quality Audit Framework using Delta Lake at Cerner
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Observability for Data Pipelines With OpenLineage
Apache Iceberg - A Table Format for Hige Analytic Datasets
Kafka Streams State Stores Being Persistent
Frame - Feature Management for Productive Machine Learning
Solving PostgreSQL wicked problems
Kafka Connect - debezium
Can Apache Kafka Replace a Database?
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Hive + Tez: A Performance Deep Dive
Introduction to Apache Flink - Fast and reliable big data processing
A Hands-on Introduction on Terraform Best Concepts and Best Practices
Ad

Viewers also liked (20)

KEY
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
PPTX
The BestBuy.com Cloud Architecture
PDF
Cassandra in e-commerce
PDF
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
PPTX
Be an agile architect
PPTX
A cloud computing primer for non-technical executives
PDF
The Upstream Game, 2hr version
PDF
Taller de Text Mining en Twitter con R
PPTX
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
PDF
2016 August POWER Up Your Insights - IBM System Summit Mumbai
PDF
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
PDF
Target: Escaping Disco-Era Data Modeling
PDF
Introduction to Hadoop
PPTX
Pattern of Innovation
PPTX
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
PDF
An eCommerce Cloud Implementation Primer
PDF
Best Buy Web 2.0
PDF
Evaluating NoSQL Performance: Time for Benchmarking
PDF
IoT NY - Google Cloud Services for IoT
PPTX
Day 2 General Session Presentations RedisConf
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
The BestBuy.com Cloud Architecture
Cassandra in e-commerce
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
Be an agile architect
A cloud computing primer for non-technical executives
The Upstream Game, 2hr version
Taller de Text Mining en Twitter con R
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
Target: Escaping Disco-Era Data Modeling
Introduction to Hadoop
Pattern of Innovation
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
An eCommerce Cloud Implementation Primer
Best Buy Web 2.0
Evaluating NoSQL Performance: Time for Benchmarking
IoT NY - Google Cloud Services for IoT
Day 2 General Session Presentations RedisConf
Ad

Similar to Cassandra and Riak at BestBuy.com (20)

PPTX
RMI_PlugLoads_FINAL
PDF
Ccs casestudy bb4b_en
PDF
Whitepaper ds roi_en
PPTX
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
PPTX
120329 Open View Venture, Firas Raouf keynote presentation
PPTX
SMAC _ Can It Maximise Staff and Customer Engagement? RWTS
PDF
Engineering eCommerce systems for Scale
PDF
Using APIs to Create an Omni-Channel Retail Experience
PDF
How to Evaluate Cloud Databases for eCommerce
PDF
Cassandra at eBay - Cassandra Summit 2013
PDF
C* Summit 2013: Buy It Now! Cassandra at eBay by Jay Patel
PDF
SMAC - Presentation from RetailWeek Technology Summit, Sept 23
PPT
VMworld 9 oct-12 speaking slot final
PDF
Leveraging the Cloud: Why it Matters to Large & SMB Retailers
PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
ODP
Top 10 Software Applications
PPTX
How Men's Wearhouse is Addressing Commerce in the Age of the Informed Consumer
PDF
Brick Is The New Black
PPTX
The digital transformation of retail
PPTX
Avani jain best_buy
RMI_PlugLoads_FINAL
Ccs casestudy bb4b_en
Whitepaper ds roi_en
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
120329 Open View Venture, Firas Raouf keynote presentation
SMAC _ Can It Maximise Staff and Customer Engagement? RWTS
Engineering eCommerce systems for Scale
Using APIs to Create an Omni-Channel Retail Experience
How to Evaluate Cloud Databases for eCommerce
Cassandra at eBay - Cassandra Summit 2013
C* Summit 2013: Buy It Now! Cassandra at eBay by Jay Patel
SMAC - Presentation from RetailWeek Technology Summit, Sept 23
VMworld 9 oct-12 speaking slot final
Leveraging the Cloud: Why it Matters to Large & SMB Retailers
Is Your Enterprise Ready to Shine This Holiday Season?
Top 10 Software Applications
How Men's Wearhouse is Addressing Commerce in the Age of the Informed Consumer
Brick Is The New Black
The digital transformation of retail
Avani jain best_buy

Recently uploaded (20)

PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PPTX
Internet of Everything -Basic concepts details
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Configure Apache Mutual Authentication
PDF
Advancing precision in air quality forecasting through machine learning integ...
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
Internet of Everything -Basic concepts details
Enhancing plagiarism detection using data pre-processing and machine learning...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
4 layer Arch & Reference Arch of IoT.pdf
MuleSoft-Compete-Deck for midddleware integrations
Taming the Chaos: How to Turn Unstructured Data into Decisions
Comparative analysis of machine learning models for fake news detection in so...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Custom Battery Pack Design Considerations for Performance and Safety
sustainability-14-14877-v2.pddhzftheheeeee
Module 1 Introduction to Web Programming .pptx
Convolutional neural network based encoder-decoder for efficient real-time ob...
Consumable AI The What, Why & How for Small Teams.pdf
Configure Apache Mutual Authentication
Advancing precision in air quality forecasting through machine learning integ...
future_of_ai_comprehensive_20250822032121.pptx
Flame analysis and combustion estimation using large language and vision assi...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf

Cassandra and Riak at BestBuy.com

  • 2. WHO WE ARE • Best Buy is the world’s largest multi-channel consumer electronics retailer, with stores in the United States, Canada, China and Mexico. • We are the 10th largest online retailer in the United States • More than 1.6 billion visitors come to our stores and BestBuy.com each year. • Our My Best Buy loyalty program is among the largest loyalty programs of its kind, with more than 40 million active members. • We provide customers with outstanding choice, unbiased advice and unmatched support for the tech needs. January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 3. A UNIQUE CUSTOMER PROMISE • THE LATEST DEVICES AND SERVICES, ALL IN ONE PLACE • IMPARTIAL & KNOWLEDGEABLE ADVICE • COMPETITIVE PRICES • THE ABILITY TO SHOP WHEN AND WHERE YOU WANT • SUPPORT FOR THE LIFE OF YOUR PRODUCTS January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 4. JOEL CRABB KANNAN SWAMINATHAN Chief Architect, Director, Web BestBuy.com Architecture January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 5. NOSQL AT BESTBUY.COM • Schema-less design —Key–value systems —Sparse column systems —Non-relational data • Distributed nature —Eventual consistency —Active-Active across clouds and datacenters —High reliability —Horizontal scaling January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 6. BESTBUY.COM January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 7. SCALABILITY GOALS • Near-Infinite • Bursts • 7X traffic spikes • Bursts > 50,000 rps • #4 in eCommerce traffic during 2013 Holiday Walmart Best Buy January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 8. FLEXIBILITY GOALS Low cost of change Fast concepts to site Daily releases Multiple versions One day of work vs. two months
  • 9. RELIABILITY GOALS • 100% availability • Zero defects • Achieved 100% cloud uptime during Holiday • ~ 2s response times January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 10. CLOUD ARCHITECTURE CONCEPTS • Clouds fail; plan for it —Multiple availability zones —Multiple regions —Multiple vendors • Datacenter connections fail; plan for it —Serve pages completely from cloud —Browse-only fallback mode January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 11. CLOUD ARCHITECTURE CONCEPTS CDN: Global Traffic Manager Browse Cloud Browse Cloud Vendor 1 Vendor 2 Best Buy Datacenter January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 12. RIAK January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 13. PRODUCT CATALOG - REQUIREMENTS • Business Requirements —Easily add new attributes to products —Use existing product content feeding systems —Provide enhanced product data to all of Best Buy • Technical Requirements —Scale to BestBuy.com’s needs —One-way replication from DC to cloud —Provide a Product Catalog API usable by Best Buy applications internally (DC) and externally (cloud) January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 14. BROWSE CLOUD - ARCHITECTURE Persistent Cache Load Balancer Web App Web App Service Aggregator Product Data Product Data Datacenter Legacy Services and Product Data January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 15. WHY RIAK FOR THE PRODUCT CATALOG • Key-value Store —Easily add new attributes —Different attribute sets for different product categories • Hub and Spoke Replication —Multiple cloud instances can be out-of-synch for seconds or minutes; this is OK —Connection to Datacenter can be lost • Resilience within all tiers —Riak’s ring architecture allows single instances to fail with little impact to the system January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 16. WHY RIAK FOR THE PRODUCT CATALOG • Legacy Bridge —Extract data from legacy system —Expose to anyone who needs the data —Extends decision to retire legacy system • Input into Product Evolution —Best Buy is stretching Riak in multiple areas —Feedback on Search and Replication —Direct connection to Riak engineers January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 17. PRODUCT CATALOG DEPLOYMENT Cloud Replicant Cloud Replicant V1.2.1 Each Cloud Replicant is a self-contained instance which can intermittently lose connectivity to the Datacenter Datacenter Master V1.4.2 January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 18. PRODUCT CATALOG - DATA FLOW Cloud Web Application Granular Service Aggregator Service NoSQL Product Data Replication Extraction Datacenter Legacy Product Data NoSQL Product Data January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 19. RIAK LESSONS LEARNED • Search eats up the CPU (Pre-Yokozuna) Search Endpoint Removed January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 20. RIAK LESSONS LEARNED • Multi-datacenter replication is hard at scale • Object replication fails silently (v1.4.2) 1.5 hours for partial update 4000 objects/sec January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 21. DID THE PRODUCT CATALOG SUCCEED? • Scale – generally < 5 ms response times in cloud • Scope – provides APIs in cloud and DC Single zone 1.5 hours response update failure, for partial times jump to 8-10 ms 4 ms @ 95% 4000 objects/sec January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 22. CASSANDRA January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 23. CUSTOMER GRAPH - REQUIREMENTS • Business Requirements —Create a single identity provider —Build an adaptive model to support a 360 degree view of the customer, customer segmentation and multi-channel personalization —Extensible framework to support federated identity interoperability with external providers • Technical Requirements —Scale to BestBuy.com’s needs with a distributed service oriented architecture —Support risk based authentication and secure service interfaces for consumers January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 24. CUSTOMER GRAPH - ARCHITECTURE CDN + GTM + WAF Load Balancer API Load Balancer API Topology: NTS RF: 3 Partitioner: Random Data Center 1 / Region 1 API API Cassandra Ring Data Center n / Region n January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 25. CUSTOMER GRAPH - CONCEPTUAL Create Connection: POST /customers/Kannan/following/customers/Joel Get all Kannan’s connections: GET /customers/Kannan/connections Get Joel’s “followers”: GET customers/Joel/followers Get customers connecting to a specific store: GET /stores/{store id}/connecting/customer January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 26. CUSTOMER GRAPH – CONCEPTUAL Topics Products Blogs January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 27. WHY CASSANDRA? • No single point of failure • Delivers on Atomicity, Isolation & Durability • Eventual Consistency • Tunable Consistency for Reads vs. Writes • Linear Scalability • Querying a column slice or a range of row keys • Data can have expiration set • Reliable multiple data center replication January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 28. LESSONS LEARNED – DATA MODELING Use composite column names; static composite types Column names are stored physically sorted and indexed Store “values” as column names Wide rows in conjunction with composite columns can be used to build indices, but… For larger data sets, distribute the columns among rows A secondary index is best modeled as a separate column family January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 29. LESSONS LEARNED – DATA MODELING Do not store an entity as a single column blob Cannot index and query on entity attributes Updates to an entity attribute would require a read and then a write Mutate just the required columns on an entity row Do not read and then write January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 30. LESSONS LEARNED - MONITORING Heap Size and Use  GC Mutation Stage (Writes) & Read Stage (Reads)  Active and Pending AE Service, Stream & Message-Streaming-Pool Stages  Especially during scheduled repair or rebuild (while adding nodes to a new data center)  Streaming requires keep-alive connections (watch out for firewalls terminating idle established connections; update periodic tcp keep-alive ping rate) Compaction Stage & Compaction Count  IO Wait, Limits - nofiles, nproc January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 31. IMPLEMENTATION STRATEGY • Start with a project team —Riak – Product Catalog —Cassandra – Customer Graph • Build expertise in development and operations • As usage grows, create a Platform team —Combine engineering into one team —Project team can then focus on business features —Focus on automation January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 32. CONCLUSIONS • Riak and Cassandra have both been successful —Riak – No complete outages in two years —Cassandra – Flawless in its first Holiday in 2013 • Differences —Replication patterns are different —Write and read treatment is different —Deployment pattern are different • High Availability —Both have highly resilient architectures —Both scale linearly January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.
  • 33. January 16, 2014 @Copyright 2014 – Best Buy, Inc. All rights reserved.