SlideShare a Scribd company logo
MongoDB in Data Science
How to convert a Pandas Proof-of-Concept to a scalable product and
why MongoDB is the key to success !
Who I am
Software Engineer
Compiler Engineer
Compiler Engineer
LLVM contributor
Software Engineer
R/D
Lead ML Engineer
Backend
Infrastructure
Sr. ML Engineer
What will we learn ?
● Understand existing tools for delivering Data Science projects and when to use them.
● Why MongoDB could be crucial for your product and business
● How to easily productionize a Pandas Proof-of-Concept
● How to use MongoDB while being open to other technologies.
Motivation
Speed of
inference
Speed of
development
Key factors
Feature
Aggregation
Model
Prediction Service
Speed of
inference
Key factors
Research
Data Scientist
Productionization
Data/ML Engineer
Speed of
development
Key factors
What is Pandas?
Most popular Python framework for data manipulation and data wrangling in Data
Science community.
What is Pandas?
Most popular Python framework for data manipulation and data wrangling in Data
Science community.
Source: numpy.org, scipy.org, matplotlib.org, scikit-learn.org, pandas.pydata.org
Source: Stackoverflow post by David Robinson
Why use Pandas Dataframes ?
Why use Pandas Dataframes ?
Why use Pandas Dataframes ?
Why use Pandas Dataframes ?
Why use Pandas Dataframes ?
Drawbacks of Pandas
● Doesn’t have persistence layer
● Doesn’t support primary and secondary indexes
○ As a result, not efficient for querying
● Doesn’t support multi-threading
Productionization options
Real time
service
Batch Job
Productionization options
Real time
service
Batch Job
Slow
Inference
Productionization options
Real time
service
Batch Job
Slow
Inference
Fast
Inference
Real time service demo (recommendation)
Event
Store
Real time service demo (recommendation)
Event
Store
Model Training
Job
Real time service demo (recommendation)
Event
Store
Model Training
Job
Model
store
Real time service demo (recommendation)
Inference 1
Event
Store
Inference 2
Inference N
Model Training
Job
Model
store
Real time service demo (recommendation)
Inference 1
Event
Store
Inference 2
Inference N
Model Training
Job
Model
store
Real time service demo (recommendation)
Inference 1
Event
Store
Inference 2
Inference N
Model Training
Job
Model
store
Real time service demo (recommendation)
Event
Store
Feature
Aggregation
Model Inference
Inference Service
request respond
Real time service demo (recommendation)
Real time service demo (recommendation)
Real time service demo (recommendation)
Things to avoid
● Don’t forget to put indexes on your collection
● Don’t put indexes on every field
● Don’t read and write from the same replica
But… we generate a tons of user events!
Is this solution going to work for us?
user events
Consumer 1
Consumer 2
Consumer N
MongoDB
Postgres
DFS
Typical data pipeline
user events
Consumer 1
Consumer 2
Consumer N
MongoDB
Postgres
DFS
Typical data pipeline
MongoDB
TTL index
Filters
event_type
...
Consumer
Shrink down the amount of data
Real time service demo (recommendation)
Inference 1
Event
Store
Inference 2
Inference N
Model Training
Job
Model
store
Training Job
Inference 1
Event
Store
Inference 2
Inference N
Model Training
Job
Model
store
Source: mongodb.com
MongoDB
Connector
Event
Store
Model
Training
Job
Model Training job
MongoDB
Connector
Event
Store
Inference Job
Inference as a batch job
Flexibility
Spark
DataFrame
MongoDB
Aggregate
Pandas
Dataframe
Batch Job versus Real Time Service
Real Time Service Batch Job
Pros On demand (scales as needed) Easier to develop and maintain
Cons Harder to develop and maintain Constantly utilizing resources
Benefits of MongoDB
● Schema-Less
● Horizontally scalable
● Available as PaaS from many vendors.
● Has a huge community
● Easier to hire people
Summary
● Allows to provide a real time experience
● Could help save expensive computational resources
● Provides a way to do real time as well as batch inference
We are hiring !!!
careers.shopbonsai.ca
References
● https://stackoverflow.blog/2017/09/14/python-growing-quickly/
● https://www.mongodb.com/products/spark-connector
● https://pandas.pydata.org/
● https://scikit-learn.org/
● https://matplotlib.org/
● https://www.scipy.org/
● https://www.numpy.org/
● https://iconscout.com/icon/device-management-mobile-computer-seo-tool-analyze-7
Thanks !!!

More Related Content

What's hot (20)

PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Munich 2019: Mastering MongoDB on Kubernetes – MongoDB Enterpr...
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB
 
PDF
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB
 
PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB
 
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB
 
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB Atlas Workshop - Singapore
Ashnikbiz
 
PPTX
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
PDF
10 - MongoDB
Kangaroot
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB
 
PDF
MongoDB Ops Manager + Kubernetes
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PPTX
A Free New World: Atlas Free Tier and How It Was Born
MongoDB
 
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Munich 2019: Mastering MongoDB on Kubernetes – MongoDB Enterpr...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB
 
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB .local Toronto 2019: Finding the Right Atlas Cluster Size: Does this ...
MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB
 
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB Atlas Workshop - Singapore
Ashnikbiz
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
10 - MongoDB
Kangaroot
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB
 
MongoDB Ops Manager + Kubernetes
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
A Free New World: Atlas Free Tier and How It Was Born
MongoDB
 

Similar to MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product Using MongoDB (20)

PPTX
When to Use MongoDB...and When You Should Not...
MongoDB
 
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
PPTX
Data Treatment MongoDB
Norberto Leite
 
PDF
MongoDB_Spark
Mat Keep
 
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
PPTX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB
 
PDF
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
PPTX
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
PPTX
Data Streaming with Apache Kafka & MongoDB
confluent
 
PPTX
An Evening with MongoDB Detroit 2013
MongoDB
 
PDF
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 
PPTX
Introduction to MongoDB Enterprise
MongoDB
 
PPTX
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
PPTX
L’architettura di classe enterprise di nuova generazione
MongoDB
 
PPTX
How leading financial services organisations are winning with tech
MongoDB
 
PDF
Creating Real-time Systems of Engagement with Analytics and Big Data
MongoDB
 
PPTX
MongoDB in a Mainframe World
MongoDB
 
PDF
Mongo nyc nyt + mongodb
Deep Kapadia
 
PDF
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
PDF
Mongo DB: Operational Big Data Database
Xpand IT
 
When to Use MongoDB...and When You Should Not...
MongoDB
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Data Treatment MongoDB
Norberto Leite
 
MongoDB_Spark
Mat Keep
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
Data Streaming with Apache Kafka & MongoDB
confluent
 
An Evening with MongoDB Detroit 2013
MongoDB
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 
Introduction to MongoDB Enterprise
MongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
L’architettura di classe enterprise di nuova generazione
MongoDB
 
How leading financial services organisations are winning with tech
MongoDB
 
Creating Real-time Systems of Engagement with Analytics and Big Data
MongoDB
 
MongoDB in a Mainframe World
MongoDB
 
Mongo nyc nyt + mongodb
Deep Kapadia
 
SQL vs NoSQL, an experiment with MongoDB
Marco Segato
 
Mongo DB: Operational Big Data Database
Xpand IT
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
PDF
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
PDF
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB
 
PDF
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
PDF
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB
 
Ad

Recently uploaded (20)

PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 

MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product Using MongoDB