SlideShare a Scribd company logo
How did we migrate data for millions
of live users from MySQL to
Cassandra
Andrey Panasyuk, @defascat
Plan
Use case
Challenges
a. Individual
b. Corporate
Servers
● Thousands of servers in prod
● Java 8
● Tomcat 7
● Spring 3
● Hibernate 3
Migration from MySQL to Cassandra for millions of active users
Sharded MySQL. Current state
Sharded MySQL. Environment
1. MySQL (Percona Server)
2. Hardware configuration:
a. two Intel E2620v2 CPU
b. 128GB RAM
c. 12x800GB Intel SSD, RAID 10
d. two 2Gb network interfaces (bonded)
MemcacheD
1. Hibernate
a. Query Cache
b. Entity Cache
2. 100th of nodes
3. ~100MBps per Memcache node
Sharded MySQL. Failover
1. master
2. co-master
3. flogger
4. archive
X4
Sharded MySQL. Approach
1. Hibernate changes:
a. Patching 2nd level caching:
i. +environment
ii. -class version
b. More info to debug problems
c. Fixing bugs
2. Own implementation:
a. FitbitTransactional
b. ManagedHibernateSession
3. Dynamic sharding concept (somewhat similar to C*)
Sharded MySQL. Data migration
Solution: vBucket
Sharded MySQL. Data migration
Migration (96 -> 152 shards):
● vBuckets to move: 96579
● 1 bucket migration time: 8 min
● 10 bucketmover * 3 processes - 12 days
Sharded MySQL. Data migration
Job
● Setup
a. Ensures vbuckets in read-only mode
b. Waits for servers to reach consensus
● Execute
a. Triggers actions (dump, insert, etc.) on Bucketmover
b. Waits for actions to complete
● Wrap-up
a. Updates shards for vbuckets, re-opens them for writes
b. Advances jobs to next action
Sharded MySQL. Schema migration
1. Locks during schema update
Solution: pt-online-schema-change + protobuf
Drawbacks:
1. Split between DML/DDL scripts
2. Binary format (additional data)
3. Additional platform specific tool
message Meta {
optional string name = 1;
optional string intro = 2;
...
repeated string requiredFeatures = 32;
}
message Challenge {
optional Meta meta = 1;
...
optional CWRace cw_race = 6;
}
Sharded MySQL. Development
1. Job system across shards
2. Use unsharded databases for lookup tables
3. Do not forget about custom annotation
@PrimaryEntity(entityType = EntityType.SHARDED_ENTITY)
Query patterns
1. Create challenge
2. List challenges by user
3. Get challenge team leaderboard by user
4. Post a message
5. List challenge messages
6. Cheer a message
MySQL. Not a problem
1. Knowledge Base
2. Response Time
Our problems
1. MySQL
a. Scalability
b. Fault tolerance
c. Schema migration
d. Locks
2. Infrastructure cost
a. MemcacheD
b. Redis
C* expectations
1. Scalability
2. Quicker fault recovery
3. Easier schema migration
4. Lack of locks
Migration specifics
1. Millions of real users in prod
2. No downtime
Migration from MySQL to Cassandra for millions of active users
Apache Cassandra
Apache Cassandra is a free and open-source distributed database
management system designed to handle large amounts of data across
many commodity servers, providing high availability with no single
point of failure. Cassandra offers robust support for clusters spanning
multiple datacenters, with asynchronous masterless replication allowing
low latency operations for all clients.
Setting cluster up
1. Performance testing
2. Monitoring
3. Alerting
4. Incremental repairs (CASSANDRA-9935)
C* tweaks
1. ParNew+CMS -> G1
2. MaxPauseGCMillis = 200ms
3. ParallelGCThreads and ConcGCThreads = 4
4. Compaction
5. gc_grace_seconds = 0 (already big TTL for our data)
Create keyspaces/tables
1. Almost the same schema with Cassandra adjustments
2. Data denormalization was required in several places
ID migration
1. Create pseudo-random migration UUID based on BIGINT
2. Thank API designers for using string as object ids.
3. Make sure clients are ready for the new length of the id.
4. Migrate API to UUID all over the place
DAO (Data Access Object)
1. Create CassandraDAO with the same interface as HibernateDAO
2. Create ProxyAdapterDAO to control which implementation to select
3. Create adapter implementation for each DAO with the same
interface as HibernateDAO
Enable shadow writes (percentage)
1. Introduce environment specific settings for shadow writes
2. Adjust ProxyAdapterDAO code to enable shadow writes by
percentage. Various implementations.
3. Analyze performance (StatsD metrics for our code + Cassandra
metrics)
Migrate legacy data
1. Create a new job to read/migrate data
2. Process data in batches
Start shadow C* reads with validation
1. Environment specific settings for data validation
2. Adjust ProxyAdapterDAO code to enable simultaneous read from
MySQL and Cassandra
3. Adjust ProxyAdapterDAO to be able to compare objects
4. Logging & investigating data discrepancy.
Check validation issues
1. Path
a. Fix code problems
b. Migrate affected challenges again
c. Go to step 1
2. Duration: 1.5 month
Turn on read from C*
1. Introduce C* return read percentage in the config settings
2. Still do shadow MySQL reads and validations
3. Increase percentage over time
Turn off writes to MySQL
Clean-up
1. Adjust places which are not suitable for C* patterns like look
through all of the shards.
2. Adjust adapters to get rid of Hibernate DAOs. Adapter hierarchy is
still presented
3. Remove obsolete code
4. Clean up MySQL database
Challenge Events Migration Example
1. Previous attempts:
a. SWF + SQS
b. MySQL + Job across all shards
2. Now
a. Complication due to C* as a queue performance
b. 16 threads across 1024 buckets
Code Redesign. Message cheer example
1. Originally
a. Read
b. Update BLOB
c. Persist
2. Approach
a. Update C* set as a single operation
Code Redesign. Life without transactions
1. BATCH
2. Some object as a single source of truth
Migration from MySQL to Cassandra for millions of active users
Challenges C*. Current State
1. Two datacenters
2. 18 nodes
3. Hardware
a. 24-core CPU
b. 64 GB RAM
4. RF: 3
Results of migration
1. Significant improvement in persistence storage scalability &
management (comparing to MySQL RDBMS)
2. Minimizing number of external points of failures
3. Squashing Technical Debt
4. Created a reusable migration module
Cassandra Inconveniences
1. Lack of ACID transactions
2. MultiDC scenarios require concious decisions for
QUORUM/LOCAL_QUORUM.
3. Data denormalization
4. CQL vs SQL limitations
5. Less readable IDs
Surprisingly not a big deal
1. Lack of JOINs due to the model
2. Lack of aggregation functions due to the model (we’re on 2.1 now)
3. Eventual consistency
4. IDs format change
Migration from MySQL to Cassandra for millions of active users

More Related Content

What's hot (20)

PDF
Cassandra Summit 2014: Cassandra at Instagram 2014
DataStax Academy
 
PDF
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr
 
PPTX
Apache cassandra v4.0
Yuki Morishita
 
PPTX
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
MongoDB
 
PPT
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
PDF
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr
 
PPTX
Lightweight Transactions at Lightning Speed
ScyllaDB
 
PDF
DevOps throughout time
Hany Fahim
 
PDF
Microservices with Micronaut
QAware GmbH
 
PDF
Cassandra Redis
Diego Pacheco
 
PPTX
Apache Cassandra 2.0
Joe Stein
 
PPTX
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
PDF
AddThis: Scaling Cassandra up and down into containers with ZFS
DataStax Academy
 
PPTX
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
PDF
How Prometheus Store the Data
Hao Chen
 
PDF
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
PDF
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
PDF
Instaclustr introduction to managing cassandra
Instaclustr
 
PDF
Anatomy of an action
Gordon Chung
 
PDF
MySQL replication & cluster
elliando dias
 
Cassandra Summit 2014: Cassandra at Instagram 2014
DataStax Academy
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr
 
Apache cassandra v4.0
Yuki Morishita
 
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
MongoDB
 
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr
 
Lightweight Transactions at Lightning Speed
ScyllaDB
 
DevOps throughout time
Hany Fahim
 
Microservices with Micronaut
QAware GmbH
 
Cassandra Redis
Diego Pacheco
 
Apache Cassandra 2.0
Joe Stein
 
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
AddThis: Scaling Cassandra up and down into containers with ZFS
DataStax Academy
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
How Prometheus Store the Data
Hao Chen
 
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
Instaclustr introduction to managing cassandra
Instaclustr
 
Anatomy of an action
Gordon Chung
 
MySQL replication & cluster
elliando dias
 

Viewers also liked (19)

PDF
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
PPT
Cassandra Data Model
ebenhewitt
 
PPTX
Learning Cassandra
Dave Gardner
 
PPT
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
PPTX
Expand Your Communication Skills within Microsoft Project 2013
International Institute for Learning
 
PPTX
Innovación tecnologica
rosanaelenae
 
PDF
certificate-CCNA Route and Switch
Luis Matamoros
 
PDF
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
PDF
Cassandra and Spark
datastaxjp
 
PPTX
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
 
PDF
Introduction to Apache Spark
Juan Pedro Moreno
 
PPTX
Presentation of Apache Cassandra
Nikiforos Botis
 
PPTX
10 Ways To Get Clients for IT Software Development Companies
Kraftblick
 
KEY
Cassandra Basics: Indexing
Benjamin Black
 
PDF
Introduction to Cassandra - Denver
Jon Haddad
 
KEY
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
PDF
Intro to py spark (and cassandra)
Jon Haddad
 
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Cassandra Data Model
ebenhewitt
 
Learning Cassandra
Dave Gardner
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Expand Your Communication Skills within Microsoft Project 2013
International Institute for Learning
 
Innovación tecnologica
rosanaelenae
 
certificate-CCNA Route and Switch
Luis Matamoros
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Cassandra and Spark
datastaxjp
 
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
 
Introduction to Apache Spark
Juan Pedro Moreno
 
Presentation of Apache Cassandra
Nikiforos Botis
 
10 Ways To Get Clients for IT Software Development Companies
Kraftblick
 
Cassandra Basics: Indexing
Benjamin Black
 
Introduction to Cassandra - Denver
Jon Haddad
 
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
Intro to py spark (and cassandra)
Jon Haddad
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
Ad

Similar to Migration from MySQL to Cassandra for millions of active users (20)

PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
PPTX
Hindsight is 20/20: MySQL to Cassandra
Michael Kjellman
 
PPTX
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
DataStax Academy
 
PDF
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
ScyllaDB
 
PDF
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
PDF
From rdbms to cassandra without a hitch
Duyhai Doan
 
PDF
Introduction to cassandra 2014
Patrick McFadin
 
PDF
What can we learn from NoSQL technologies?
Ivan Zoratti
 
PPTX
Simple Way for MySQL to NoSQL
Okcan Yasin Saygılı
 
PPT
5266732.ppt
hothyfa
 
PDF
Libon cassandra summiteu2014
Duyhai Doan
 
PDF
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Jon Haddad
 
PDF
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
ScyllaDB
 
PPTX
Executing Queries on a Sharded Database
Neha Narula
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PPTX
M6d cassandrapresentation
Edward Capriolo
 
PPTX
Cassandra to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
PDF
Database Migration Strategies and Pitfalls by Patrick Bossman
ScyllaDB
 
PDF
«NoSQL benchmarking v2.0. Исследование производительности современных NoSQL-р...
Olga Lavrentieva
 
PDF
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
Hindsight is 20/20: MySQL to Cassandra
Michael Kjellman
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
DataStax Academy
 
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
ScyllaDB
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
From rdbms to cassandra without a hitch
Duyhai Doan
 
Introduction to cassandra 2014
Patrick McFadin
 
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Simple Way for MySQL to NoSQL
Okcan Yasin Saygılı
 
5266732.ppt
hothyfa
 
Libon cassandra summiteu2014
Duyhai Doan
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Jon Haddad
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
ScyllaDB
 
Executing Queries on a Sharded Database
Neha Narula
 
Learning Cassandra NoSQL
Pankaj Khattar
 
M6d cassandrapresentation
Edward Capriolo
 
Cassandra to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
Database Migration Strategies and Pitfalls by Patrick Bossman
ScyllaDB
 
«NoSQL benchmarking v2.0. Исследование производительности современных NoSQL-р...
Olga Lavrentieva
 
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Digital Circuits, important subject in CS
contactparinay1
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 

Migration from MySQL to Cassandra for millions of active users

  • 1. How did we migrate data for millions of live users from MySQL to Cassandra Andrey Panasyuk, @defascat
  • 4. Servers ● Thousands of servers in prod ● Java 8 ● Tomcat 7 ● Spring 3 ● Hibernate 3
  • 7. Sharded MySQL. Environment 1. MySQL (Percona Server) 2. Hardware configuration: a. two Intel E2620v2 CPU b. 128GB RAM c. 12x800GB Intel SSD, RAID 10 d. two 2Gb network interfaces (bonded)
  • 8. MemcacheD 1. Hibernate a. Query Cache b. Entity Cache 2. 100th of nodes 3. ~100MBps per Memcache node
  • 9. Sharded MySQL. Failover 1. master 2. co-master 3. flogger 4. archive X4
  • 10. Sharded MySQL. Approach 1. Hibernate changes: a. Patching 2nd level caching: i. +environment ii. -class version b. More info to debug problems c. Fixing bugs 2. Own implementation: a. FitbitTransactional b. ManagedHibernateSession 3. Dynamic sharding concept (somewhat similar to C*)
  • 11. Sharded MySQL. Data migration Solution: vBucket
  • 12. Sharded MySQL. Data migration Migration (96 -> 152 shards): ● vBuckets to move: 96579 ● 1 bucket migration time: 8 min ● 10 bucketmover * 3 processes - 12 days
  • 13. Sharded MySQL. Data migration Job ● Setup a. Ensures vbuckets in read-only mode b. Waits for servers to reach consensus ● Execute a. Triggers actions (dump, insert, etc.) on Bucketmover b. Waits for actions to complete ● Wrap-up a. Updates shards for vbuckets, re-opens them for writes b. Advances jobs to next action
  • 14. Sharded MySQL. Schema migration 1. Locks during schema update Solution: pt-online-schema-change + protobuf Drawbacks: 1. Split between DML/DDL scripts 2. Binary format (additional data) 3. Additional platform specific tool message Meta { optional string name = 1; optional string intro = 2; ... repeated string requiredFeatures = 32; } message Challenge { optional Meta meta = 1; ... optional CWRace cw_race = 6; }
  • 15. Sharded MySQL. Development 1. Job system across shards 2. Use unsharded databases for lookup tables 3. Do not forget about custom annotation @PrimaryEntity(entityType = EntityType.SHARDED_ENTITY)
  • 16. Query patterns 1. Create challenge 2. List challenges by user 3. Get challenge team leaderboard by user 4. Post a message 5. List challenge messages 6. Cheer a message
  • 17. MySQL. Not a problem 1. Knowledge Base 2. Response Time
  • 18. Our problems 1. MySQL a. Scalability b. Fault tolerance c. Schema migration d. Locks 2. Infrastructure cost a. MemcacheD b. Redis
  • 19. C* expectations 1. Scalability 2. Quicker fault recovery 3. Easier schema migration 4. Lack of locks
  • 20. Migration specifics 1. Millions of real users in prod 2. No downtime
  • 22. Apache Cassandra Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
  • 23. Setting cluster up 1. Performance testing 2. Monitoring 3. Alerting 4. Incremental repairs (CASSANDRA-9935)
  • 24. C* tweaks 1. ParNew+CMS -> G1 2. MaxPauseGCMillis = 200ms 3. ParallelGCThreads and ConcGCThreads = 4 4. Compaction 5. gc_grace_seconds = 0 (already big TTL for our data)
  • 25. Create keyspaces/tables 1. Almost the same schema with Cassandra adjustments 2. Data denormalization was required in several places
  • 26. ID migration 1. Create pseudo-random migration UUID based on BIGINT 2. Thank API designers for using string as object ids. 3. Make sure clients are ready for the new length of the id. 4. Migrate API to UUID all over the place
  • 27. DAO (Data Access Object) 1. Create CassandraDAO with the same interface as HibernateDAO 2. Create ProxyAdapterDAO to control which implementation to select 3. Create adapter implementation for each DAO with the same interface as HibernateDAO
  • 28. Enable shadow writes (percentage) 1. Introduce environment specific settings for shadow writes 2. Adjust ProxyAdapterDAO code to enable shadow writes by percentage. Various implementations. 3. Analyze performance (StatsD metrics for our code + Cassandra metrics)
  • 29. Migrate legacy data 1. Create a new job to read/migrate data 2. Process data in batches
  • 30. Start shadow C* reads with validation 1. Environment specific settings for data validation 2. Adjust ProxyAdapterDAO code to enable simultaneous read from MySQL and Cassandra 3. Adjust ProxyAdapterDAO to be able to compare objects 4. Logging & investigating data discrepancy.
  • 31. Check validation issues 1. Path a. Fix code problems b. Migrate affected challenges again c. Go to step 1 2. Duration: 1.5 month
  • 32. Turn on read from C* 1. Introduce C* return read percentage in the config settings 2. Still do shadow MySQL reads and validations 3. Increase percentage over time
  • 33. Turn off writes to MySQL
  • 34. Clean-up 1. Adjust places which are not suitable for C* patterns like look through all of the shards. 2. Adjust adapters to get rid of Hibernate DAOs. Adapter hierarchy is still presented 3. Remove obsolete code 4. Clean up MySQL database
  • 35. Challenge Events Migration Example 1. Previous attempts: a. SWF + SQS b. MySQL + Job across all shards 2. Now a. Complication due to C* as a queue performance b. 16 threads across 1024 buckets
  • 36. Code Redesign. Message cheer example 1. Originally a. Read b. Update BLOB c. Persist 2. Approach a. Update C* set as a single operation
  • 37. Code Redesign. Life without transactions 1. BATCH 2. Some object as a single source of truth
  • 39. Challenges C*. Current State 1. Two datacenters 2. 18 nodes 3. Hardware a. 24-core CPU b. 64 GB RAM 4. RF: 3
  • 40. Results of migration 1. Significant improvement in persistence storage scalability & management (comparing to MySQL RDBMS) 2. Minimizing number of external points of failures 3. Squashing Technical Debt 4. Created a reusable migration module
  • 41. Cassandra Inconveniences 1. Lack of ACID transactions 2. MultiDC scenarios require concious decisions for QUORUM/LOCAL_QUORUM. 3. Data denormalization 4. CQL vs SQL limitations 5. Less readable IDs
  • 42. Surprisingly not a big deal 1. Lack of JOINs due to the model 2. Lack of aggregation functions due to the model (we’re on 2.1 now) 3. Eventual consistency 4. IDs format change