SlideShare a Scribd company logo
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
#MDBLocal
“To free the genius within everyone by
making data stunningly easy to work
with.”
#MDBLocal
Welcome to the World of
Atlas Data Lake
#MDBLocal
Isabel Peters
Senior Software Engineer, MongoDB
Atlas Backup
#MDBLocal
Why are we building this?
“IDC predicts that by 2025 worldwide data will reach
175 Zettabytes and 49% of it will reside in the public
cloud. “
VS
#MDBLocal
Atlas Data Lake Technical Deep Dive
1. Design Goals and Requirements
2. Creating an Atlas Data Lake
3. Atlas Data Lake Architecture
4. Future improvements
Design Goals and Requirements
#MDBLocal
Implementation Requirements
#MDBLocal
MongoDB Wire Protocol Support
Requirements
1) Look and act like MongoDB
Solution
Empty
• Implement a TCP server in Go.
• Used mongo-go-driver’s wireprotocol packagey
• Used mongo-go-driver's bson package
• Read only
#MDBLocal
MongoDB Security Model
Requirements
2) Access customer’s data securely.
Solution
Empty
• Users configured in MongoDB Atlas
• Same authentication and authorization
• Configure buckets
#MDBLocal
Scalable Processing
Requirements
3) Handle long running queries over vast amounts
of data using resources efficiently
Solution
Empty
• Read-only commands
• Use server’s aggregation engine
• Distributed MQL processing
• Intelligent file targeting
#MDBLocal
Data Formats
Requirements
4) Support a variety of data formats
Solution
Empty
• Avro (gzipped)
• Parquet
• BSON/ JSON (gzipped)
• CSV/TSV (gzipped)
#MDBLocal
Atlas Data Lake Features
Multiple data formats
Scalable
MongoDB Query Language
Serverless
On Demand
Integrated with Atlas
Creating your Atlas Data Lake
Files in S3 bucket: ent-archive
/archive/customers
- a-m.json
- n-z.json
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
#MDBLocal
You control your data layout
Stores
Empty
Databases
Empty
Collections
Empty
DataSources
CollectionCollection
Store Store
Database
DataSource DataSource
DataSource
#MDBLocal
Data Lake Configuration
1. Configure a new Data Lake in Atlas
2. Connect to your Data Lake
3. Configure your databases and collections
4. Query your Data Lake
Configuration: S3 Store
s3: {
name: "ent-archive",
bucket: "ent-archive",
region: ”us-east-1",
prefix: "/archive/"
}
Configuration: Databases & Collections
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
#MDBLocal
Querying via MongoDB Atlas
• Atlas users require readWriteAnyDatabase or readAnyDatabase roles.
• Use MongoDB drivers/clients including the mongo shell and MongoDB
Compass
• Write queries in MongoDB Query Language (MQL)
Atlas Data Lake Architecture
#MDBLocal
MQL à Distributed MQL
Parse query
Parallelize processing
Distribute workload
#MDBLocal
Atlas Data Lake Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
#MDBLocal
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
Future improvements
#MDBLocal
On the roadmap …
MongoDB Operators
$out
$merge
$graphLookup
Performance
• Aggregation
• Indexes
• Statistics over data
File FormatsIntegrations
Summary
#MDBLocal
Atlas Data Lake is the best way to:
Access long-term data in multiple formats
Query long-term data using MQL
Analyse long-term data on demand
#MDBLocal
Give it a try -
Create your own Atlas Data Lake!
THANK YOU

More Related Content

What's hot (20)

PPTX
Data Analytics: Understanding Your MongoDB Data
MongoDB
 
PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
PPTX
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB
 
PDF
MongoDB on Azure
Norberto Leite
 
PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
PPTX
Jumpstart: Introduction to MongoDB
MongoDB
 
PPTX
Advanced Schema Design Patterns
MongoDB
 
PDF
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
Blazing Fast Analytics with MongoDB & Spark
MongoDB
 
PDF
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB Atlas
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
PPTX
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
MongoDB
 
PDF
Spark and MongoDB
Norberto Leite
 
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
PPTX
Benefits of Using MongoDB Over RDBMSs
MongoDB
 
Data Analytics: Understanding Your MongoDB Data
MongoDB
 
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB
 
MongoDB on Azure
Norberto Leite
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
Jumpstart: Introduction to MongoDB
MongoDB
 
Advanced Schema Design Patterns
MongoDB
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
MongoDB
 
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB Atlas
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
MongoDB
 
Spark and MongoDB
Norberto Leite
 
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
Benefits of Using MongoDB Over RDBMSs
MongoDB
 

Similar to MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PPTX
Cloud-based Data Lake for Analytics and AI
Torsten Steinbach
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
walk2talk srl
 
PDF
Embracing Database Diversity with Kafka and Debezium
Frank Lyaruu
 
PDF
Serverless Data Platform
Shu-Jeng Hsieh
 
PDF
Serverless SQL
Torsten Steinbach
 
PDF
Scylla Summit 2016: Compose on Containing the Database
ScyllaDB
 
PPTX
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
PDF
Mongodb
Thiago Veiga
 
PPTX
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
PPTX
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
PDF
TechEvent 2019: DB, CMU and EUS engineering with vagrant; Stefan Oehrli - Tri...
Trivadis
 
PPT
Spring data presentation
Oleksii Usyk
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Cloud-based Data Lake for Analytics and AI
Torsten Steinbach
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
walk2talk srl
 
Embracing Database Diversity with Kafka and Debezium
Frank Lyaruu
 
Serverless Data Platform
Shu-Jeng Hsieh
 
Serverless SQL
Torsten Steinbach
 
Scylla Summit 2016: Compose on Containing the Database
ScyllaDB
 
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
Mongodb
Thiago Veiga
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
TechEvent 2019: DB, CMU and EUS engineering with vagrant; Stefan Oehrli - Tri...
Trivadis
 
Spring data presentation
Oleksii Usyk
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
PDF
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
PDF
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB
 
PDF
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
PDF
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB
 
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB
 
Ad

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 

MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive