SlideShare a Scribd company logo
Optimizing Data Management for
MongoDB
October 11, 2017
My Background
Why Bother With Backup and Test Data Mgmt?
The average cost of a data loss
incident is $900,000
90% of enterprises delay applications
because of a lack of test data
Source: EMC, Imanis Data
Data Replicas Don’t Prevent ALL Data Loss
Human errors: dropping a
collection
Application corruption: incorrect
updates to a collection
Replicas go out of synch
Primary
Secondary Secondary
Current MongoDB Backup Options
mongodump
Filesystem/storage
snapshots
Ops Manager
MongoDB Backup/Recovery Options:
mongodump
Very resource
intensive
PROBLEM
Not feasible
for granular
recovery
PROBLEMPROBLEM
Not
incremental-
forever
PROBLEM
MongoDB Backup/Recovery Options:
filesystem/storage snapshots
Requires
quiescing of
MongoDB
instance for
consistent
snapshots
PROBLEM
Requires
periodic full
backups to
ensure faster
recovery
PROBLEM
Cannot do
point in time
recovery or
back up
specific
collections
PROBLEM
MongoDB Backup/Recovery Options: Ops
Manager
Requires
agents on
production –
scaling
increases
overhead
PROBLEM PROBLEM
Storage
optimization
is dependent
on external
media. Not
data-aware
PROBLEM
Extremely
difficult to
restore to a
different
topology
Test Data Mgmt Eliminates Costly Application
Delays
Application
teams wait for
production data
A company delayed 3-4 releases a year
Cost the company $450K per year
of companies delay
application releases90%
BUSINESS IMPACT
CUSTOMER EXAMPLE
PRIMARY CHALLENGE
Challenges With Typical Test Data
Management
Change
Request - 1
week
Provision
Production
Data - 1
week
Create Test
DB and
Mask Data
- 1 week
Create
Samples of
Production
Data – 2
days
Push
Production
Data To
Test –
Hours
Repeat
Process –
3-4 weeks
The Evolution of Data Management
THE NEXT
25 YEARS
THE
TRADITIONAL
WORLD
Data ManagementData Platforms
Imanis Data in Production
Test
Cluster
Research
Cluster
Imanis Data GUI
Hadoop/Spark
Cluster
Cassandra
Cluster
Vertica
Cluster
Couchbase
Cluster
Imanis Data
Smart Storage
Cluster
MongoDB
Cluster
The Imanis Data Architecture
• Deep de-duplication and compression with app-aware architecture
• Incremental-forever backup architecture
• High availability via erasure coding in distributed cluster architecture
Smart Storage Optimizer
The Imanis Data Architecture
Native querying and analytics via
active compute layer
Unbounded scale with a
Hadoop-native architecture
Smart Storage Optimizer
Active Compute Services Distributed File System
The Imanis Data Architecture
• Google-like catalog shortens
data recovery time
• Automatic schema
generation for mirroring and
backups
• Granular recovery at an
object level
• Recovery to multiple
topologies
• Native integration with
LDAP and Kerberos for
authentication
• Role-based access control
defines specific privileges
• Stateless, consistent,
irreversible, and one-way
masks for PII data
Smart Storage Optimizer
Active Compute Services Distributed File System
Metadata Catalog Data Orchestration ServicesSecurity Services
Smart Storage Optimizer
The Imanis Data Architecture
GUI CLI API
Active Compute Services Distributed File System
• ‘Single pane of glass’ for multiple use cases and data platforms
• Agentless architecture minimizes management overhead
• GUI, CLI, REST-based Talena API options
Metadata Catalog Data Orchestration ServicesSecurity Services
Machine Intelligence: ThreatSense
• Proactively identify
anomalous data loss and
ransomware to reduce
downtime
• Collects nearly 50
attributes to set baseline
• Enables user input to
optimize machine
learning
Q&A

More Related Content

What's hot (20)

PDF
Snowflake + Syncsort: Get Value from Your Mainframe Data
Precisely
 
PDF
Clinical Suspecting at Scale Using PySpark
Databricks
 
PPTX
Kyligence Cloud 4 - An Overview
SamanthaBerlant
 
PPTX
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
PDF
Build Real-Time Applications with Databricks Streaming
Databricks
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PPTX
Veritas + MongoDB
MongoDB
 
PPTX
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
In-Memory Computing Summit
 
PDF
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
PDF
Intro to Delta Lake
Databricks
 
PPTX
Real time architecture big data
Sanjeev Solanki
 
PDF
Netherlands OSUG | Sep 30
CatarinaPereira64715
 
PDF
Newsweaver - Big Data Storage
Sean Griffin
 
PPTX
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
In-Memory Computing Summit
 
PPTX
Google Cloud Platform Intro to Data and Storage Services
Joseph Holbrook, Chief Learning Officer (CLO)
 
PDF
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
PPT
Dan Stone "Scalabale Application Frameworks"
Chris Purrington
 
PPTX
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
PPTX
Journey to the Cloud: Database Modernization Best Practices
Datavail
 
PPTX
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
In-Memory Computing Summit
 
Snowflake + Syncsort: Get Value from Your Mainframe Data
Precisely
 
Clinical Suspecting at Scale Using PySpark
Databricks
 
Kyligence Cloud 4 - An Overview
SamanthaBerlant
 
Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
Build Real-Time Applications with Databricks Streaming
Databricks
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Veritas + MongoDB
MongoDB
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
In-Memory Computing Summit
 
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
Intro to Delta Lake
Databricks
 
Real time architecture big data
Sanjeev Solanki
 
Netherlands OSUG | Sep 30
CatarinaPereira64715
 
Newsweaver - Big Data Storage
Sean Griffin
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
In-Memory Computing Summit
 
Google Cloud Platform Intro to Data and Storage Services
Joseph Holbrook, Chief Learning Officer (CLO)
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
Dan Stone "Scalabale Application Frameworks"
Chris Purrington
 
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Journey to the Cloud: Database Modernization Best Practices
Datavail
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
In-Memory Computing Summit
 

Similar to Optimizing Data Management for MongoDB (20)

PPTX
Debunking Common Myths of Hadoop Backup & Test Data Management
Imanis Data
 
PPTX
An Introduction to MongoDB Ops Manager
MongoDB
 
PDF
Mongo db ops mug pres
David Erickson
 
PDF
Cloud Data Strategy event London
MongoDB
 
PPTX
MongoDB Deployment Checklist
MongoDB
 
PPTX
Debunking Common Myths of Cassandra Backup
Imanis Data
 
PPTX
MongoDB.local Atlanta: Modern Data Backup and Recovery from On-Premises to th...
MongoDB
 
PPTX
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPTX
MonogDB Admin 101 - MonogDBDays Munich
Marc Schwering
 
PPTX
La creación de una capa operacional con MongoDB
MongoDB
 
PPTX
MongoDB in a Mainframe World
MongoDB
 
PPTX
Capacity Planning
MongoDB
 
PPTX
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
PPTX
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPTX
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
MongoDB
 
PPTX
Introduction to MongoDB Enterprise
MongoDB
 
PPTX
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
PPTX
Key Architecture and Performance Principles to Optimize Data Management
Jana Lass
 
PPTX
Optimize Your Vertica Data Management Infrastructure
Imanis Data
 
PDF
Modern Data Backup and Recovery from On-Premises to the Public Cloud
MongoDB
 
Debunking Common Myths of Hadoop Backup & Test Data Management
Imanis Data
 
An Introduction to MongoDB Ops Manager
MongoDB
 
Mongo db ops mug pres
David Erickson
 
Cloud Data Strategy event London
MongoDB
 
MongoDB Deployment Checklist
MongoDB
 
Debunking Common Myths of Cassandra Backup
Imanis Data
 
MongoDB.local Atlanta: Modern Data Backup and Recovery from On-Premises to th...
MongoDB
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
MonogDB Admin 101 - MonogDBDays Munich
Marc Schwering
 
La creación de una capa operacional con MongoDB
MongoDB
 
MongoDB in a Mainframe World
MongoDB
 
Capacity Planning
MongoDB
 
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
MongoDB
 
Introduction to MongoDB Enterprise
MongoDB
 
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
Key Architecture and Performance Principles to Optimize Data Management
Jana Lass
 
Optimize Your Vertica Data Management Infrastructure
Imanis Data
 
Modern Data Backup and Recovery from On-Premises to the Public Cloud
MongoDB
 
Ad

Recently uploaded (20)

PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Ad

Optimizing Data Management for MongoDB

  • 1. Optimizing Data Management for MongoDB October 11, 2017
  • 3. Why Bother With Backup and Test Data Mgmt? The average cost of a data loss incident is $900,000 90% of enterprises delay applications because of a lack of test data Source: EMC, Imanis Data
  • 4. Data Replicas Don’t Prevent ALL Data Loss Human errors: dropping a collection Application corruption: incorrect updates to a collection Replicas go out of synch Primary Secondary Secondary
  • 5. Current MongoDB Backup Options mongodump Filesystem/storage snapshots Ops Manager
  • 6. MongoDB Backup/Recovery Options: mongodump Very resource intensive PROBLEM Not feasible for granular recovery PROBLEMPROBLEM Not incremental- forever PROBLEM
  • 7. MongoDB Backup/Recovery Options: filesystem/storage snapshots Requires quiescing of MongoDB instance for consistent snapshots PROBLEM Requires periodic full backups to ensure faster recovery PROBLEM Cannot do point in time recovery or back up specific collections PROBLEM
  • 8. MongoDB Backup/Recovery Options: Ops Manager Requires agents on production – scaling increases overhead PROBLEM PROBLEM Storage optimization is dependent on external media. Not data-aware PROBLEM Extremely difficult to restore to a different topology
  • 9. Test Data Mgmt Eliminates Costly Application Delays Application teams wait for production data A company delayed 3-4 releases a year Cost the company $450K per year of companies delay application releases90% BUSINESS IMPACT CUSTOMER EXAMPLE PRIMARY CHALLENGE
  • 10. Challenges With Typical Test Data Management Change Request - 1 week Provision Production Data - 1 week Create Test DB and Mask Data - 1 week Create Samples of Production Data – 2 days Push Production Data To Test – Hours Repeat Process – 3-4 weeks
  • 11. The Evolution of Data Management THE NEXT 25 YEARS THE TRADITIONAL WORLD Data ManagementData Platforms
  • 12. Imanis Data in Production Test Cluster Research Cluster Imanis Data GUI Hadoop/Spark Cluster Cassandra Cluster Vertica Cluster Couchbase Cluster Imanis Data Smart Storage Cluster MongoDB Cluster
  • 13. The Imanis Data Architecture • Deep de-duplication and compression with app-aware architecture • Incremental-forever backup architecture • High availability via erasure coding in distributed cluster architecture Smart Storage Optimizer
  • 14. The Imanis Data Architecture Native querying and analytics via active compute layer Unbounded scale with a Hadoop-native architecture Smart Storage Optimizer Active Compute Services Distributed File System
  • 15. The Imanis Data Architecture • Google-like catalog shortens data recovery time • Automatic schema generation for mirroring and backups • Granular recovery at an object level • Recovery to multiple topologies • Native integration with LDAP and Kerberos for authentication • Role-based access control defines specific privileges • Stateless, consistent, irreversible, and one-way masks for PII data Smart Storage Optimizer Active Compute Services Distributed File System Metadata Catalog Data Orchestration ServicesSecurity Services
  • 16. Smart Storage Optimizer The Imanis Data Architecture GUI CLI API Active Compute Services Distributed File System • ‘Single pane of glass’ for multiple use cases and data platforms • Agentless architecture minimizes management overhead • GUI, CLI, REST-based Talena API options Metadata Catalog Data Orchestration ServicesSecurity Services
  • 17. Machine Intelligence: ThreatSense • Proactively identify anomalous data loss and ransomware to reduce downtime • Collects nearly 50 attributes to set baseline • Enables user input to optimize machine learning
  • 18. Q&A

Editor's Notes

  • #9: Ask the question of Hari about scalability in OpsManager and comparison with Imanis Data
  • #14: ----- Meeting Notes (10/10/17 10:35) ----- Deployment diagram before the detailed architecture diagram