SlideShare a Scribd company logo
WHEN
WHERE
Atlas Data Lake Technical Deep-Dive
James Osgood – Senior Solutions Architect
Paul Done – Master Solutions Architect
#MDBLocal
State of Affairs
Why are we building this?
• Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public
cloud.
• Cloud storage is cost-effective
• Cloud storage is hard to operationalize
#MDBLocal
A New Service Offered by MongoDB Atlas
Atlas Data Lake allows you to...
▪ Access long-term data
▪ Query long-term data
▪ Analyze long-term data
#MDBLocal
Requirements
Every product has requirements!
▪ Look and act like MongoDB
▪ Access customer’s data securely
▪ Handle queries over vast amounts of data
▪ Handle long-running queries
▪ Efficient use of resources
#MDBLocal
Structured documents in
AWS S3 Buckets
Opens Up A Whole Ecosystem Of Data Science Tools
By implementing MQL’s wire protocol, a rich & mature toolkit is available for data analytics
MongoDB
BI Connector
MongoDB Compass MongoDB Charts3rd Party BI & Reporting Tools
Including Machine Learning
MongoDB’s standard programming
language Drivers for use in custom
scripts & applications for advanced
analytics
MongoDB’s Query Engine
Emulating MongoDB
#MDBLocal
Language
Must be able to communicate with our drivers.
▪ Written in Go
▪ Implemented a TCP server
▪ Used mongo-go-driver’s wireprotocol package
▪ Used mongo-go-driver's bson package
#MDBLocal
Security
Must have the same security as MongoDB.
▪ Users configured in Atlas
▪ Implemented MongoDB’s security model
▪ Authentication
▪ Authorization
▪ Require the use of TLS + SNI
▪ SNI = Server Name Indicator
#MDBLocal
Behavior
Must behave like MongoDB.
▪ Implemented commands for a read-only server
▪ Used the server’s aggregation engine
Customer’s Data
#MDBLocal
Security: Customers
Customers have complete control.
▪ Provide us with an IAM Role
▪ Configure your buckets
▪ Configure your users in Atlas
#MDBLocal
Security: Atlas
Atlas controls access to your data.
▪ Storage of IAM Role
▪ Temporary Credentials
#MDBLocal
Configuration
Customers control their data layout.
▪ Stores
▪ Databases, Collections
▪ DataSources
CollectionCollection
Store Store
Database
DataSource
DataSource
DataSource
#MDBLocal
Configuration: File Formats
Each file has a format.
▪ BSON (gzipped)
▪ JSON (gzipped)
▪ Avro (gzipped)
▪ CSV/TSV (gzipped)
▪ Parquet
MongoDB Atlas Data Lake UI
/archive/customers
- a-m.json
- n-z.json
Configuration (S3 Bucket): ent-archive
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
s3 : {
name: "ent-archive",
bucket: "ent-archive",
region: "us-west-2",
prefix: "/archive/"
}
Configuration: Store
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}"
}]
}
Configuration: Data
history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}”
}, {
store: "atlas",
cluster: "my-cluster",
db: "customers",
collection: "invoices"
}]
}
Configuration: Data (Future)
Queries
#MDBLocal
Processing
MQL Distributed MQL
▪ Parse
▪ Parallelize
▪ Distribute
MongoDB Atlas
Data Lake
Architecture
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
DEMO
Demo
Future
#MDBLocal
Future
More supported MongoDB operators.
▪ $out
▪ $merge
▪ $graphLookup
▪ Geo operators
▪ Full Text Search
#MDBLocal
Future
Optimizations!
▪ Indexes
▪ Statistics
#MDBLocal
Future
More File Formats!
▪ ORC
▪ Excel
▪ PDF
#MDBLocal
Future
Integrations!
▪ Atlas
▪ Microsoft Azure
▪ Google Cloud
#MDBLocal
Hiring
Lots to do!
▪ mongodb.com/careers
THANK YOU
#MDBlocal
MongoDB Atlas Data Lake
Technical Deep Dive [DEV]
James Osgood
https://www.surveymonkey.com/r/KQZZB8N
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive

More Related Content

What's hot (20)

PPTX
Benefits of Using MongoDB Over RDBMSs
MongoDB
 
PDF
From RDBMS to MongoDB
MongoDB
 
PPTX
Sizing Your MongoDB Cluster
MongoDB
 
PDF
Responsive & Responsible: Implementing Responsive Design at Scale
scottjehl
 
PDF
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Kent Graziano
 
PDF
Building a Data Lake on AWS
Gary Stafford
 
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
PPTX
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
PPTX
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
PDF
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Databricks
 
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
KEY
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
PDF
Making Sense of Schema on Read
Kent Graziano
 
PPTX
Introduction To MongoDB
ElieHannouch
 
PPTX
NoSQL and MongoDB Introdction
Brian Enochson
 
PDF
JDV for Codemotion Rome 2017
Luigi Fugaro
 
PPTX
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
PDF
Worst Practices in Data Warehouse Design
Kent Graziano
 
PDF
NoSQL no more: SQL on Druid with Apache Calcite
gianmerlino
 
PDF
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB
 
Benefits of Using MongoDB Over RDBMSs
MongoDB
 
From RDBMS to MongoDB
MongoDB
 
Sizing Your MongoDB Cluster
MongoDB
 
Responsive & Responsible: Implementing Responsive Design at Scale
scottjehl
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Kent Graziano
 
Building a Data Lake on AWS
Gary Stafford
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Databricks
 
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Making Sense of Schema on Read
Kent Graziano
 
Introduction To MongoDB
ElieHannouch
 
NoSQL and MongoDB Introdction
Brian Enochson
 
JDV for Codemotion Rome 2017
Luigi Fugaro
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
Worst Practices in Data Warehouse Design
Kent Graziano
 
NoSQL no more: SQL on Druid with Apache Calcite
gianmerlino
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB
 

Similar to MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PPTX
Jumpstart: Introduction to MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PPTX
Jumpstart: Your Introduction to MongoDB
MongoDB
 
PPTX
Jumpstart: Building Your First App with MongoDB
MongoDB
 
PPTX
Jumpstart: Your Introduction To MongoDB
MongoDB
 
PPTX
Introducing MongoDB Atlas
MongoDB
 
PPTX
MongoDB Atlas
MongoDB
 
PPTX
MongoDB Evening Austin, TX 2017
MongoDB
 
PDF
MongoDB Europe 2016 - MongoDB Atlas
MongoDB
 
PDF
Build robust streaming data pipelines with MongoDB and Kafka P2
Ashnikbiz
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
Jumpstart: Introduction to MongoDB
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
Jumpstart: Your Introduction to MongoDB
MongoDB
 
Jumpstart: Building Your First App with MongoDB
MongoDB
 
Jumpstart: Your Introduction To MongoDB
MongoDB
 
Introducing MongoDB Atlas
MongoDB
 
MongoDB Atlas
MongoDB
 
MongoDB Evening Austin, TX 2017
MongoDB
 
MongoDB Europe 2016 - MongoDB Atlas
MongoDB
 
Build robust streaming data pipelines with MongoDB and Kafka P2
Ashnikbiz
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
PDF
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 

MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive