SlideShare a Scribd company logo
Hadoop & MongoDB
Understanding your Big Data
2
MongoDB World
3
Speakers
Jnan Dash
Senior Advisor
jnan.dash@mongodb.com
Kelly Stirman
Director of Products
kelly.stirman@mongodb.com
4
• Last 12 years (2002-Now) - Executive Consultant, on the board
and advisory board of several new software companies
including Big Data players such as MongoDB
• 10 Years (1992-2002) – Oracle, Group Vice President, Systems
Architecture and Technology, responsible for the server product
planning and rollout
• 16 years (1975-1992) – IBM, Planner, architect, and
development manager for DB2 product line at Silicon Valley
Lab and Austin Lab. Head of IBM‟s Database
architecture, strategy, and technology
Jnan Dash
5
• Finally, some real innovation in DBMS
• MongoDB momentum is unprecedented!
• The changing landscape needs MongoDB
– “Internet scale” distributed operations + highly flexible
data model for agile development + open source
• Perfect fit for cloud, mobility, and big data
Why am I excited about MongoDB?
6
• Big Data - Observations
• Evolution of Database Technology
• Hadoop+MongoDB
• Customer Examples
• Roadmap
• Summary
Agenda
7
1. Thousand years ago – Experimental Science
Description of natural phenomenon
2. Last few hundred years – Theoretical Science
Newton‟s Laws, Maxwell‟s Equation,..
3. Last few decades – Computational Science
Simulation of complex phenomena
4. Today – Data-intensive Science
Scientists overwhelmed with data deluge
Unify theory, experiment & simulation
The Fourth Paradigm
8
Internet Scale Commercial Supercomputing
• Originated with companies operating at Internet scale (to process
ever increasing #users and data)
– Yahoo in the 1990s, then Google, Facebook, Twitter
– They needed to do it quickly, economically, and affordably at scale
• Hadoop is the first commercial supercomputing software platform
– Works at scale, affordable at scale
• HPC was used for meteorology and engineering scientific super
computing. Big data is commercial equivalent of HPC
– Less about equations, more about discovery, patterns
• Many technologies have been around for decades
• Clustering
• Parallel processing
• Distributed file systems
9
Big Data: 3V’s
10
Some Make it 4V’s
11
What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
12
Big Data – the full spectrum
Transaction
Processing
Analytical
Processing
Data
Mining, Visualiz
ation, and
Integration
Tools
RDBMS OLAP/DW
DW
Appliance
Hadoop, Im
pala,..
NoSQL
NewSQL, In
-
Memory, Str
eam...
Online/Realtime Offline/Batch
13
Hadoop Ecosystem
Programming
Languages
Computation
Object Storage
Zookeeper
(Coordination)
Core Apache Hadoop Related Apache Projects
HDFS
(Hadoop Distributed File System)
MapReduce
(Distributed Programing Framework)
Hive
(SQL)
Pig
(Data Flow)
HBase
(Wide Column Storage)
HCatalog
(Meta Data)
HMS
(Management)
Table Storage
Database Technology Evolution
15
Data Management over the years
1960’s
File
Systems
1970’s
1st Generation
DBMS
Data as
Shared Resource
1980’s
Relational
Technology
Ease of Query
1990’s
New data types
OLAP/DW
Web Support
Unstructured Data
2005+
Big Data
Post-PC, Data
Deluge, 3Vs,
NoSQL
16
Operational vs. Analytics
2010
RDBMS
Key-Value/
Wide-column
OLAP/DW
Hadoop
2000
RDBMS
OLAP/DW
1990
RDBMS
Operational
Database
Data warehouse
Document DB
NoSQL
17
MongoDB Features
• JSON Document Model
with Dynamic Schemas
• Auto-Sharding for
Horizontal Scalability
• Text Search
• Aggregation Framework
and MapReduce
• Full, Flexible Index Support
and Rich Queries
• Native Replication for High
Availability
• Advanced Security
• Large Media Storage with
GridFS
18
Documents are Rich Data Structures
{
first_name: „Paul‟,
surname: „Miller‟,
cell: „+447557505611‟
city: „London‟,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: „Bentley‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}
Fields can contain an
array of sub-documents
Fields
Typed field
values
Fields can
contain
arrays
19
Machine Generated Data
20
• Hundreds of thousands of records per second
• Fast response required
• Sometimes all data kept, sometimes just
summary
• Horizontal scalability required
Fast Moving Data
21
• A machine generates a specific kind of data
• The data model is unlikely to change
• But there are so many different machines…
• Queryability across all types
Data is Structured, but Varied…
22
• Event data written multiple times per second,
minute, or hour
• Tracking progression of metrics over time
Time Series Data
23
Do More With Your Data
MongoDB
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980
Geospatial
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
• Find all the cars described as having
leather seats
Aggregation
• Calculate the average value of Paul’s
car collection
Map Reduce
• What is the ownership pattern of colors
by geography over time? (is purple
trending up in China?)
{
first_name: „Paul‟,
surname: „Miller‟,
city: „London‟,
location: [51.524,-0.087],
cars: [
{ model: „Bentley‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}
Hadoop & MongoDB
25
Enterprise Big Data Stack
EDWHadoop
Management&Monitoring
Security&Auditing
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
26
MongoDB & Hadoop
• Multi-source analytics
• Interactive & Batch
• Data lake
• Online, Real-time
• High concurrency & HA
• Live analytics
Operational Analytical
MongoDB
Connector for
Hadoop
27
Hadoop Is Good for…
Risk Modeling Churn Analysis
Recommendation
Modeling
Ad Targeting
Transaction
Analysis
Trade
Surveillance
Network Failure
Prediction
Search Quality Data Lake
28
MongoDB Is Good for…
Single View Mobile Apps Fraud Detection
Customer Data
Management
Content
Management &
Delivery
Database-as-a-
Service
Product & Asset
Catalogs
Internet of Things
Social &
Collaboration
Customer Examples
30
Many more examples
Big Data Product & Asset
Catalogs
Security &
Fraud
Internet of
Things
Database-as-a-
Service
Mobile
Apps
Customer Data
Management
Single
View
Social &
Collaboration
Content
Management
Intelligence Agencies
Top Investment and
Retail Banks
Top US Retailer
Top Global Shipping
Company
Top Industrial Equipment
Manufacturer
Top Media Company
Top Investment and
Retail Banks
31
MongoDB Enterprise Value
32
• Makes MongoDB a Hadoop-enabled file system
• Full use of MongoDB‟s indexes
• Read and write to live data, in-place
• Copy data between Hadoop and MongoDB
• Full support for data processing
– Hive
– MapReduce
– Pig
– Streaming
– EMR
MongoDB+Hadoop Connector
MongoDB
Connector for
Hadoop
33
Customer Example – MetLife
Customer
Service
• Insurance policies
• Demographic data
• Customer web data
• Call center data
• Real-time churn detection
• Customer action analysis
• Churn prediction
algorithms
Churn Analysis
MongoDB
Connector for
Hadoop
34
Customer Example - eCommerce
Travel
• Flights, hotels and cars
• Real-time offers
• User profiles, reviews
• User metadata (previous
purchases, clicks, views)
• User segmentation
• Offer recommendation engine
• Ad serving engine
• Bundling engine
Algorithms
MongoDB
Connector for
Hadoop
35
Roadmap
Capability Today Soon
Connectivity Custom
Centralized
Administration
MongoDB  Hadoop Dynamic reads Automated Snapshots
BSON Support MapReduce, Hive, Pig Impala, Tez, Spark
Hadoop  MongoDB Dynamic writes Bulk Loader
36
• Big Data covers a wide spectrum
– Volume, Velocity, Variety
– Hence the mythical equation Big Data = Hadoop
• Enterprises are more concerned about Variety
– MongoDB provides the best platform
• Hadoop and MongoDB are complimentary
– MongoDB for operational workloads
– Hadoop for analytical workloads
Summary
MongoDB & Hadoop - Understanding Your Big Data

More Related Content

What's hot (20)

PPTX
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PPTX
Big data and smart farming
Sjaak Wolfert
 
PDF
Machine Learning In Insurance
Accenture Insurance
 
PPTX
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Sandeep Wakchaure
 
PPTX
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
PPTX
Machine Learning Models in Production
DataWorks Summit
 
PPTX
Face detection presentation slide
Sanjoy Dutta
 
PPTX
What is ChatGPT
jeetendra mandal
 
PPTX
Microsoft Data Platform - What's included
James Serra
 
PDF
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
PDF
AI-900: Microsoft Azure AI Fundamentals 2021
Sean Xie
 
PDF
Graph Gurus 23: Best Practices To Model Your Data Using A Graph Database
TigerGraph
 
PPTX
Smarter Fraud Detection With Graph Data Science
Neo4j
 
PPTX
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Sanjay Srivastava
 
PPTX
07 big data sgbd
Patrick Bury
 
PPTX
Data Science
Prakhyath Rai
 
PDF
Data science and Artificial Intelligence
Suman Srinivasan
 
PPTX
Ppt on data science
Ansh Budania
 
PDF
Graph database Use Cases
Max De Marzi
 
PDF
The future of AI is hybrid
Qualcomm Research
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Big data and smart farming
Sjaak Wolfert
 
Machine Learning In Insurance
Accenture Insurance
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Sandeep Wakchaure
 
No Code AI - How to Deploy Machine Learning Models with Zero Code?
Skyl.ai
 
Machine Learning Models in Production
DataWorks Summit
 
Face detection presentation slide
Sanjoy Dutta
 
What is ChatGPT
jeetendra mandal
 
Microsoft Data Platform - What's included
James Serra
 
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
AI-900: Microsoft Azure AI Fundamentals 2021
Sean Xie
 
Graph Gurus 23: Best Practices To Model Your Data Using A Graph Database
TigerGraph
 
Smarter Fraud Detection With Graph Data Science
Neo4j
 
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Sanjay Srivastava
 
07 big data sgbd
Patrick Bury
 
Data Science
Prakhyath Rai
 
Data science and Artificial Intelligence
Suman Srinivasan
 
Ppt on data science
Ansh Budania
 
Graph database Use Cases
Max De Marzi
 
The future of AI is hybrid
Qualcomm Research
 

Similar to MongoDB & Hadoop - Understanding Your Big Data (20)

PPTX
Essential Tools For Your Big Data Arsenal
MongoDB
 
PPTX
Your Big Data Arsenal - Strata 2013
Matt Asay
 
PPTX
An Enterprise Architect's View of MongoDB
MongoDB
 
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
PPTX
When to Use MongoDB...and When You Should Not...
MongoDB
 
PPTX
Webinar: How to Drive Business Value in Financial Services with MongoDB
MongoDB
 
PDF
Dba to data scientist -Satyendra
pasalapudi123
 
PDF
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
MongoDB
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
PPTX
When to Use MongoDB
MongoDB
 
PDF
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
PDF
Mongo DB: Operational Big Data Database
Xpand IT
 
PPTX
Webinar: When to Use MongoDB
MongoDB
 
PDF
Big data
roysonli
 
PPTX
Data Treatment MongoDB
Norberto Leite
 
PPTX
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
PPSX
Big Data Basic Concepts | Presented in 2014
Kenneth Igiri
 
PPTX
Advanced applications with MongoDB
Norberto Leite
 
PDF
Big data and analytics
Bohitesh Misra, PMP
 
PPTX
Data science big data and analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Essential Tools For Your Big Data Arsenal
MongoDB
 
Your Big Data Arsenal - Strata 2013
Matt Asay
 
An Enterprise Architect's View of MongoDB
MongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
When to Use MongoDB...and When You Should Not...
MongoDB
 
Webinar: How to Drive Business Value in Financial Services with MongoDB
MongoDB
 
Dba to data scientist -Satyendra
pasalapudi123
 
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
MongoDB
 
Big Data Analytics with Hadoop
Philippe Julio
 
When to Use MongoDB
MongoDB
 
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
Mongo DB: Operational Big Data Database
Xpand IT
 
Webinar: When to Use MongoDB
MongoDB
 
Big data
roysonli
 
Data Treatment MongoDB
Norberto Leite
 
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
Big Data Basic Concepts | Presented in 2014
Kenneth Igiri
 
Advanced applications with MongoDB
Norberto Leite
 
Big data and analytics
Bohitesh Misra, PMP
 
Data science big data and analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 

MongoDB & Hadoop - Understanding Your Big Data

  • 4. 4 • Last 12 years (2002-Now) - Executive Consultant, on the board and advisory board of several new software companies including Big Data players such as MongoDB • 10 Years (1992-2002) – Oracle, Group Vice President, Systems Architecture and Technology, responsible for the server product planning and rollout • 16 years (1975-1992) – IBM, Planner, architect, and development manager for DB2 product line at Silicon Valley Lab and Austin Lab. Head of IBM‟s Database architecture, strategy, and technology Jnan Dash
  • 5. 5 • Finally, some real innovation in DBMS • MongoDB momentum is unprecedented! • The changing landscape needs MongoDB – “Internet scale” distributed operations + highly flexible data model for agile development + open source • Perfect fit for cloud, mobility, and big data Why am I excited about MongoDB?
  • 6. 6 • Big Data - Observations • Evolution of Database Technology • Hadoop+MongoDB • Customer Examples • Roadmap • Summary Agenda
  • 7. 7 1. Thousand years ago – Experimental Science Description of natural phenomenon 2. Last few hundred years – Theoretical Science Newton‟s Laws, Maxwell‟s Equation,.. 3. Last few decades – Computational Science Simulation of complex phenomena 4. Today – Data-intensive Science Scientists overwhelmed with data deluge Unify theory, experiment & simulation The Fourth Paradigm
  • 8. 8 Internet Scale Commercial Supercomputing • Originated with companies operating at Internet scale (to process ever increasing #users and data) – Yahoo in the 1990s, then Google, Facebook, Twitter – They needed to do it quickly, economically, and affordably at scale • Hadoop is the first commercial supercomputing software platform – Works at scale, affordable at scale • HPC was used for meteorology and engineering scientific super computing. Big data is commercial equivalent of HPC – Less about equations, more about discovery, patterns • Many technologies have been around for decades • Clustering • Parallel processing • Distributed file systems
  • 10. 10 Some Make it 4V’s
  • 11. 11 What’s driving Big Data - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time
  • 12. 12 Big Data – the full spectrum Transaction Processing Analytical Processing Data Mining, Visualiz ation, and Integration Tools RDBMS OLAP/DW DW Appliance Hadoop, Im pala,.. NoSQL NewSQL, In - Memory, Str eam... Online/Realtime Offline/Batch
  • 13. 13 Hadoop Ecosystem Programming Languages Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache Projects HDFS (Hadoop Distributed File System) MapReduce (Distributed Programing Framework) Hive (SQL) Pig (Data Flow) HBase (Wide Column Storage) HCatalog (Meta Data) HMS (Management) Table Storage
  • 15. 15 Data Management over the years 1960’s File Systems 1970’s 1st Generation DBMS Data as Shared Resource 1980’s Relational Technology Ease of Query 1990’s New data types OLAP/DW Web Support Unstructured Data 2005+ Big Data Post-PC, Data Deluge, 3Vs, NoSQL
  • 17. 17 MongoDB Features • JSON Document Model with Dynamic Schemas • Auto-Sharding for Horizontal Scalability • Text Search • Aggregation Framework and MapReduce • Full, Flexible Index Support and Rich Queries • Native Replication for High Availability • Advanced Security • Large Media Storage with GridFS
  • 18. 18 Documents are Rich Data Structures { first_name: „Paul‟, surname: „Miller‟, cell: „+447557505611‟ city: „London‟, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: „Bentley‟, year: 1973, value: 100000, … }, { model: „Rolls Royce‟, year: 1965, value: 330000, … } } } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  • 20. 20 • Hundreds of thousands of records per second • Fast response required • Sometimes all data kept, sometimes just summary • Horizontal scalability required Fast Moving Data
  • 21. 21 • A machine generates a specific kind of data • The data model is unlikely to change • But there are so many different machines… • Queryability across all types Data is Structured, but Varied…
  • 22. 22 • Event data written multiple times per second, minute, or hour • Tracking progression of metrics over time Time Series Data
  • 23. 23 Do More With Your Data MongoDB Rich Queries • Find Paul’s cars • Find everybody in London with a car built between 1970 and 1980 Geospatial • Find all of the car owners within 5km of Trafalgar Sq. Text Search • Find all the cars described as having leather seats Aggregation • Calculate the average value of Paul’s car collection Map Reduce • What is the ownership pattern of colors by geography over time? (is purple trending up in China?) { first_name: „Paul‟, surname: „Miller‟, city: „London‟, location: [51.524,-0.087], cars: [ { model: „Bentley‟, year: 1973, value: 100000, … }, { model: „Rolls Royce‟, year: 1965, value: 330000, … } } }
  • 25. 25 Enterprise Big Data Stack EDWHadoop Management&Monitoring Security&Auditing RDBMS CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Online Data Offline Data
  • 26. 26 MongoDB & Hadoop • Multi-source analytics • Interactive & Batch • Data lake • Online, Real-time • High concurrency & HA • Live analytics Operational Analytical MongoDB Connector for Hadoop
  • 27. 27 Hadoop Is Good for… Risk Modeling Churn Analysis Recommendation Modeling Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction Search Quality Data Lake
  • 28. 28 MongoDB Is Good for… Single View Mobile Apps Fraud Detection Customer Data Management Content Management & Delivery Database-as-a- Service Product & Asset Catalogs Internet of Things Social & Collaboration
  • 30. 30 Many more examples Big Data Product & Asset Catalogs Security & Fraud Internet of Things Database-as-a- Service Mobile Apps Customer Data Management Single View Social & Collaboration Content Management Intelligence Agencies Top Investment and Retail Banks Top US Retailer Top Global Shipping Company Top Industrial Equipment Manufacturer Top Media Company Top Investment and Retail Banks
  • 32. 32 • Makes MongoDB a Hadoop-enabled file system • Full use of MongoDB‟s indexes • Read and write to live data, in-place • Copy data between Hadoop and MongoDB • Full support for data processing – Hive – MapReduce – Pig – Streaming – EMR MongoDB+Hadoop Connector MongoDB Connector for Hadoop
  • 33. 33 Customer Example – MetLife Customer Service • Insurance policies • Demographic data • Customer web data • Call center data • Real-time churn detection • Customer action analysis • Churn prediction algorithms Churn Analysis MongoDB Connector for Hadoop
  • 34. 34 Customer Example - eCommerce Travel • Flights, hotels and cars • Real-time offers • User profiles, reviews • User metadata (previous purchases, clicks, views) • User segmentation • Offer recommendation engine • Ad serving engine • Bundling engine Algorithms MongoDB Connector for Hadoop
  • 35. 35 Roadmap Capability Today Soon Connectivity Custom Centralized Administration MongoDB  Hadoop Dynamic reads Automated Snapshots BSON Support MapReduce, Hive, Pig Impala, Tez, Spark Hadoop  MongoDB Dynamic writes Bulk Loader
  • 36. 36 • Big Data covers a wide spectrum – Volume, Velocity, Variety – Hence the mythical equation Big Data = Hadoop • Enterprises are more concerned about Variety – MongoDB provides the best platform • Hadoop and MongoDB are complimentary – MongoDB for operational workloads – Hadoop for analytical workloads Summary

Editor's Notes

  • #18: MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search, geospatial, and more
  • #20: We have all these fantastic machines… they give the same metrics they used to, but now they transmit the data. We have metrics about metrics, and we need a place to store the data. We need a place to understand what the data means.
  • #26: This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  • #27: Makes MongoDB a Hadoop-enabled file systemRead and write to live data, in-placeCopy data between Hadoop and MongoDBUses MongoDB indexes to filter dataFull support for data processingHiveMapReducePigStreaming
  • #28: What each of these has in common is that they’re retrospective: they’re about looking at the past to help predict the future. The learnings from these Hadoop applications end up being applied by a different technology. This is where MongoDB comes in.
  • #31: Customer Data Management (e.g., Customer Relationship Management, Biometrics, User Profile Management)Product and Asset Catalogs (e.g., eCommerce, Inventory Management)Social and Collaboration Apps: (e.g., Social Networks and Feeds, Document and Project Collaboration Tools)Mobile Apps (e.g., for Smartphones and Tablets) Content Management (e.g, Web CMS, Document Management, Digital Asset and Metadata Management)Internet of Things / Machine to Machine (e.g., mHealth, Connected Home, Smart Meters)Security and Fraud Apps (e.g., Fraud Detection, Cyberthreat Analysis)DbaaS (Cloud Database-as-a-Service)Data Hub (Aggregating Data from Multiple Sources for Operational or Analytical Purposes)Big Data (e.g., Genomics, Clickstream Analysis, Customer Sentiment Analysis)