SlideShare a Scribd company logo
Confidential and Proprietary1
Debunking Common Myths
About Cassandra Backup and
Test Data Management
Hari Mankude, CTO
December 2016
Confidential and Proprietary2
My Background
Confidential and Proprietary3
Why Bother With Backup and Test Data Mgmt?
The average cost of a data loss incident is $900,000
90% of enterprises delay applications because of a lack
of test data
• Source: EMC, Talena
Confidential and Proprietary4
Myth #1 Data Replicas Prevent Data Loss
N1
N2
N3
N4
Human errors: dropping
column of a table
Application corruption:
incorrect updates to a
column
Confidential and Proprietary5
Myth #2 Cassandra Replication Prevents Data Loss
N1
N2
N3
N4
N5
N6
N7
N8Data Center #1 Data Center #2
Confidential and Proprietary6
Myth #3: Cassandra snapshots are an effective backup
strategy
Snapshots
result in
storage
amplification
due to
compaction
PROBLEM
Need
scheduler to
take timely
snapshots &
delete older
restore points
PROBLEM
Confidential and Proprietary7
Myth #4: Restoring from snapshots is trivial
When your
cluster size
changes due
to addition or
deletion of
nodes
PROBLEM
If you have
config (e.g.,
compaction
policy) or
name changes
PROBLEM
Scaling your
restore to
hundreds of
nodes
PROBLEM
Confidential and Proprietary8
Myth #5: The traditional backup/restore process works
Confidential and Proprietary9
Myth #6 Test Data Management Is A Simple Process
Change
Request -
1 week
Provision
Production
Data - 1
week
Create
Test DB
and Mask
Data - 1
week
Create
Samples of
Production
Data – 2
days
Push
Production
Data To
Test –
Hours
Repeat
Process –
3-4 weeks
Confidential and Proprietary10
The Evolution of Data Management
THE NEXT
25 YEARS
THE
TRADITIONAL
WORLD
Data ManagementData Platforms
Confidential and Proprietary11
Talena in Production
Test
Cluster
Research
Cluster
Talena GUI
Hadoop/Spark
Cluster
Cassandra
Cluster
Vertica
Cluster
Couchbase
Cluster
Talena
Smart Storage
Cluster
Confidential and Proprietary12
The Talena Architecture
• Deep de-duplication and compression with app-aware architecture
• Incremental-forever backup architecture
• High availability via erasure coding in distributed cluster architecture
Smart Storage Optimizer
Confidential and Proprietary13
The Talena Architecture
Native querying and analytics
via active compute layer
Unbounded scale with a
Hadoop-native architecture
Smart Storage Optimizer
Active Compute Services Distributed File System
Confidential and Proprietary14
The Talena Architecture
• Google-like catalog
shortens data recovery
time
• Automatic schema
generation for mirroring
and backups
• Granular recovery at an
object level
• Recovery to multiple
topologies
• Native integration with
LDAP and Kerberos for
authentication
• Role-based access control
defines specific privileges
• Transparent data encryption
• Masking for PII data
Smart Storage Optimizer
Active Compute Services Distributed File System
Metadata Catalog Data Orchestration ServicesSecurity Services
Confidential and Proprietary15
Smart Storage Optimizer
The Talena Architecture
GUI CLI API
Active Compute Services Distributed File System
• ‘Single pane of glass’ for multiple use cases and data platforms
• Agentless architecture minimizes management overhead
• GUI, CLI, REST-based Talena API options
Metadata Catalog Data Orchestration ServicesSecurity Services
Confidential and Proprietary16
Q&A
 We’ll send you a link to our
eBook “The Cassandra
Backup Guide”
 Additional resources: talena-
inc.com/resources and
talena-inc.com/blog
 Ping us with any additional
questions: info@talena-
inc.com
Confidential and Proprietary17
Q and A

More Related Content

What's hot (20)

PPTX
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Cloudera, Inc.
 
PPTX
Big Data Fundamentals
Cloudera, Inc.
 
PPT
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
PPTX
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
PPTX
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 
PPTX
Road to Cloudera certification
Cloudera, Inc.
 
PPTX
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
PPTX
Data Science and CDSW
Jason Hubbard
 
PPTX
Solr consistency and recovery internals
Cloudera, Inc.
 
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
PDF
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
 
PPTX
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
PPTX
Key Architecture and Performance Principles to Optimize Data Management
Jana Lass
 
PPTX
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
PDF
Hadoop on Cloud: Why and How?
Cloudera, Inc.
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
PDF
In memory computing principles by Mac Moore of GridGain
Data Con LA
 
PPTX
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 
PDF
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera, Inc.
 
PPT
Migrating legacy ERP data into Hadoop
DataWorks Summit
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Cloudera, Inc.
 
Big Data Fundamentals
Cloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
Cloudera, Inc.
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 
Road to Cloudera certification
Cloudera, Inc.
 
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Data Science and CDSW
Jason Hubbard
 
Solr consistency and recovery internals
Cloudera, Inc.
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
 
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Key Architecture and Performance Principles to Optimize Data Management
Jana Lass
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Hadoop on Cloud: Why and How?
Cloudera, Inc.
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
In memory computing principles by Mac Moore of GridGain
Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera, Inc.
 
Migrating legacy ERP data into Hadoop
DataWorks Summit
 

Similar to Debunking Common Myths of Cassandra Backup (20)

PPTX
Three ways object storage can save you time in 2017
Maciej Lasota
 
PPTX
Cloud Data Management Made Easy Webinar Presentation
Imanis Data
 
PPTX
Optimizing Data Management for MongoDB
Imanis Data
 
PPTX
Webinar Presentation: Stories of Accidental Data Loss
Imanis Data
 
PPTX
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
{code} by Dell EMC
 
PPTX
Managing storage on Prem and in Cloud
Howard Marks
 
PDF
IBM Cloud Object Storage Point of View
Philippe Ponti
 
PDF
The Foundations of Cloud Data Storage
Jan-Erik Finlander
 
PPTX
Solve 3 Enterprise Storage Problems Today
Stephen Foskett
 
PDF
Inter connect2016 yss1841-cloud-storage-options-v4
Tony Pearson
 
PDF
Apache Cassandra: NoSQL in the enterprise
jbellis
 
PDF
L'agilité du cloud public dans votre datacenter avec ECS & Neutrino
RSD
 
PPTX
Robust Applications in Mesos using External Storage
David vonThenen
 
PDF
Choose the Right Container Storage for Kubernetes
Yusuf Hadiwinata Sutandar
 
PDF
Netflix at-disney-09-26-2014
Monal Daxini
 
PDF
Pythian: My First 100 days with a Cassandra Cluster
DataStax Academy
 
PDF
S100299 ibm-cos-orlando-v1804c
Tony Pearson
 
PDF
IBM Object Storage and Software Defined Solutions - Cleversafe
Diego Alberto Tamayo
 
PPTX
Is Your Cloud Content Strategy a Ticking Time Bomb?
Foxit Software Inc.
 
PDF
Transforming your Information Infrastructure with IBM's Storage Cloud Solutio...
IBM India Smarter Computing
 
Three ways object storage can save you time in 2017
Maciej Lasota
 
Cloud Data Management Made Easy Webinar Presentation
Imanis Data
 
Optimizing Data Management for MongoDB
Imanis Data
 
Webinar Presentation: Stories of Accidental Data Loss
Imanis Data
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
{code} by Dell EMC
 
Managing storage on Prem and in Cloud
Howard Marks
 
IBM Cloud Object Storage Point of View
Philippe Ponti
 
The Foundations of Cloud Data Storage
Jan-Erik Finlander
 
Solve 3 Enterprise Storage Problems Today
Stephen Foskett
 
Inter connect2016 yss1841-cloud-storage-options-v4
Tony Pearson
 
Apache Cassandra: NoSQL in the enterprise
jbellis
 
L'agilité du cloud public dans votre datacenter avec ECS & Neutrino
RSD
 
Robust Applications in Mesos using External Storage
David vonThenen
 
Choose the Right Container Storage for Kubernetes
Yusuf Hadiwinata Sutandar
 
Netflix at-disney-09-26-2014
Monal Daxini
 
Pythian: My First 100 days with a Cassandra Cluster
DataStax Academy
 
S100299 ibm-cos-orlando-v1804c
Tony Pearson
 
IBM Object Storage and Software Defined Solutions - Cleversafe
Diego Alberto Tamayo
 
Is Your Cloud Content Strategy a Ticking Time Bomb?
Foxit Software Inc.
 
Transforming your Information Infrastructure with IBM's Storage Cloud Solutio...
IBM India Smarter Computing
 
Ad

Recently uploaded (20)

PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PPTX
Introduction to Fluid and Thermal Engineering
Avesahemad Husainy
 
PPTX
cybersecurityandthe importance of the that
JayachanduHNJc
 
PDF
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
PDF
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PPTX
Ground improvement techniques-DEWATERING
DivakarSai4
 
Zero Carbon Building Performance standard
BassemOsman1
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Introduction to Fluid and Thermal Engineering
Avesahemad Husainy
 
cybersecurityandthe importance of the that
JayachanduHNJc
 
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
Ground improvement techniques-DEWATERING
DivakarSai4
 
Ad

Debunking Common Myths of Cassandra Backup

  • 1. Confidential and Proprietary1 Debunking Common Myths About Cassandra Backup and Test Data Management Hari Mankude, CTO December 2016
  • 3. Confidential and Proprietary3 Why Bother With Backup and Test Data Mgmt? The average cost of a data loss incident is $900,000 90% of enterprises delay applications because of a lack of test data • Source: EMC, Talena
  • 4. Confidential and Proprietary4 Myth #1 Data Replicas Prevent Data Loss N1 N2 N3 N4 Human errors: dropping column of a table Application corruption: incorrect updates to a column
  • 5. Confidential and Proprietary5 Myth #2 Cassandra Replication Prevents Data Loss N1 N2 N3 N4 N5 N6 N7 N8Data Center #1 Data Center #2
  • 6. Confidential and Proprietary6 Myth #3: Cassandra snapshots are an effective backup strategy Snapshots result in storage amplification due to compaction PROBLEM Need scheduler to take timely snapshots & delete older restore points PROBLEM
  • 7. Confidential and Proprietary7 Myth #4: Restoring from snapshots is trivial When your cluster size changes due to addition or deletion of nodes PROBLEM If you have config (e.g., compaction policy) or name changes PROBLEM Scaling your restore to hundreds of nodes PROBLEM
  • 8. Confidential and Proprietary8 Myth #5: The traditional backup/restore process works
  • 9. Confidential and Proprietary9 Myth #6 Test Data Management Is A Simple Process Change Request - 1 week Provision Production Data - 1 week Create Test DB and Mask Data - 1 week Create Samples of Production Data – 2 days Push Production Data To Test – Hours Repeat Process – 3-4 weeks
  • 10. Confidential and Proprietary10 The Evolution of Data Management THE NEXT 25 YEARS THE TRADITIONAL WORLD Data ManagementData Platforms
  • 11. Confidential and Proprietary11 Talena in Production Test Cluster Research Cluster Talena GUI Hadoop/Spark Cluster Cassandra Cluster Vertica Cluster Couchbase Cluster Talena Smart Storage Cluster
  • 12. Confidential and Proprietary12 The Talena Architecture • Deep de-duplication and compression with app-aware architecture • Incremental-forever backup architecture • High availability via erasure coding in distributed cluster architecture Smart Storage Optimizer
  • 13. Confidential and Proprietary13 The Talena Architecture Native querying and analytics via active compute layer Unbounded scale with a Hadoop-native architecture Smart Storage Optimizer Active Compute Services Distributed File System
  • 14. Confidential and Proprietary14 The Talena Architecture • Google-like catalog shortens data recovery time • Automatic schema generation for mirroring and backups • Granular recovery at an object level • Recovery to multiple topologies • Native integration with LDAP and Kerberos for authentication • Role-based access control defines specific privileges • Transparent data encryption • Masking for PII data Smart Storage Optimizer Active Compute Services Distributed File System Metadata Catalog Data Orchestration ServicesSecurity Services
  • 15. Confidential and Proprietary15 Smart Storage Optimizer The Talena Architecture GUI CLI API Active Compute Services Distributed File System • ‘Single pane of glass’ for multiple use cases and data platforms • Agentless architecture minimizes management overhead • GUI, CLI, REST-based Talena API options Metadata Catalog Data Orchestration ServicesSecurity Services
  • 16. Confidential and Proprietary16 Q&A  We’ll send you a link to our eBook “The Cassandra Backup Guide”  Additional resources: talena- inc.com/resources and talena-inc.com/blog  Ping us with any additional questions: info@talena- inc.com

Editor's Notes

  • #2: .
  • #10: ----- Meeting Notes (9/1/16 13:59) ----- Change the slide ----- Meeting Notes (9/1/16 16:03) ----- Add sampling bullet point Then push sampled data to test Add repeat bucket
  • #11: Starting over 20 years ago, the traditional database market became the foundation of enterprise applications. A whole ecosystem of data management products emerged to provide capabilities like backup/recovery (Veritas), storage pooling (Data Domain) test/dev management (Delphix) and Iron Mountain (archiving). But, companies had to purchase separate products to provide a full data management solution for their enterprise. Over the past few years and into the foreseeable future, modern data platforms will become new hubs of enterprise applications. These modern data platforms also need data management capabilities, similar to what happened with traditional databases. (Click for build) Our vision is to help companies with their critical data management needs in a single software product, one that is optimized specifically for these modern Big Data environments.
  • #13: The next few slides will introduce the unique Talena architecture and highlight how this architecture delivers on these core business benefits. One of the most significant components of our architecture is our Smart Storage Optimizer. By integrating compute and storage management into our storage optimizer, we’re able to deliver significant cost savings. Our application-aware architecture enables us to do deep de-duplication and compression. Our backup process is incremental-forever, saving on storage costs, and by incorporating erasure coding we also ensure high availability no matter how large a Talena cluster you choose to deploy.
  • #15: Supports transparent data encryption in the security services section
  • #16: Our agentless architecture makes Talena an ideal solution for big data architectures and minimizes your operational overhead. Furthermore, Talena can support multiple data platforms, versions, and use cases in a single deployment of Talena, thereby providing a “single pane of glass” for all your big data management needs. While most of our clients work within our user interface, we also provide a REST-based API to accomplish the same tasks.