SlideShare a Scribd company logo
Advanced MongoDB
Aggregation
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
#MDBW16
MongoDB Aggregation Framework
‱ One of the least understood parts of MongoDB
‱ Many customers never get to the aggregation framework
‱ Had limited utility 2.6 (16MB limit)
‱ The aggregation framework “grew up” in 3.0
‱ We continue to enhance (3.2 and now 3..4)
#MDBW16
A quick reminder
‱ A Processing Pipeline
‱ Design to process large groups of documents in parallel
‱ Is shard aware
‱ Can create new data from old
#MDBW16
A Typical Pipeline
Match Project Group Sort
Find Query Select Fields
Combine Fields
Rename fields
Calculate
Group By
Execute accumulators
Rename fields
Sort results
#MDBW16
Description of Data Set
‱ MOT : Ministry of Transport, then
‱ VOSA : Vehicle and Operator Services Agency and then
‱ DVSA : Driver and Vehicle Services Agency
‱ The test is still called the MOT Test
‱ It is a test of road worthiness
‱ Nearly every motorised vehicle must pass an MOT every year
‱ New cars are exempt for three years
#MDBW16
Introduction to the MOT Public Data Set
#MDBW16
This is a big Collection
> db.test_results.count()
253472477
>
253 million records
#MDBW16
Example Document
{ "_id" : ObjectId("5759ee6e8684975e1098af68"),
"TestID" : 400,
"VehicleID" : "278",
"TestDate" : ISODate("2013-04-23T00:00:00Z"),
"TestClassID" : "4",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 99284,
"Postcode" : "E",
"Make" : "AUDI",
"Model" : "A3",
"Colour" : "BLACK",
"FuelType" : "P",
"CylinderCapacity" : 1598,
"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }
#MDBW16
We will work with one year : 2013
> db.results_2013.count()
37390457
Still 37.3 million records
#MDBW16
Use $match to filter
cars = { "$match" : { "TestClassID" : { "$eq" : "4" }}}
motorcycles = {
"$match" : {
"$or" : [
{
"TestClassID" : "1"
},
{
"TestClassID" : "2"
}
]
}
}
#MDBW16
Lets make a collection of Cars
removeNulls = { "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}}
carsonly = { "$match" : { "TestClassID" : “4”}}
output = { “$out” : “cars_2013” }
db.results_2013.aggregate( [ removeNulls, carsonly, output ] )
#MDBW16
Now lets create a Summary
For each car of a given make and age:
‱ The total number of cars
‱ The average mileage for this collection of cars
‱ The total number of passes
#MDBW16
We $project to get the data we want
ageinusecs ={ "$subtract" :[ "$TestDate", "$FirstUseDate”]}
ageinyears = { "$divide" :[ ageinusecs,(1000*3600*24*365)]}
floorage = { "$floor" : ageinyears }
ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}
#MDBW16
The actual $projection
project =
{ "$project" : { "Make” :1,
"VehicleID" :1,
"TestResult” :1,
"TestDate” :1,
"TestMileage” :1,
"FirstUseDate” :1,
"Age” :floorage,
"pass” :ispass }}
#MDBW16
We Then $group to get the accumulated values
group = { "$group" :
{ "_id" : { "make": "$Make", "age" : "$Age" },
"count" : {"$sum":1} ,
"miles” : {"$avg":"$TestMileage"},
"passes": {"$sum":"$pass” }}}
#MDBW16
Now put it all in pipeline
db.results_2013.aggregate([project,group,out])
out = { "$out" : ”cars_summary" }
The $out operator creates a new collection
#MDBW16
Restrictions
‱ 16MB result limit if not using cursors or $out
‱ 100MB in memory limit unless $allowDiskUse is specified
‱ $geoNear must be the first operator in a pipeline
‱ $out must be the last operator in the pipeline
#MDBW16
Market Size
$36 Billion
Partners
1,000+
International Offices
15
Global Employees
575+
Downloads Worldwide
15,000,000+
Make a GIANT Impact
www.mongodb.com/careers
MongoDB World 2016 : Advanced Aggregation

More Related Content

What's hot (20)

PPTX
MongoDB Aggregation
Amit Ghosh
 
PPTX
Agg framework selectgroup feb2015 v2
MongoDB
 
PDF
Mongodb Aggregation Pipeline
zahid-mian
 
PDF
Data Processing and Aggregation with MongoDB
MongoDB
 
PPTX
MongoDB - Aggregation Pipeline
Jason Terpko
 
PPTX
Webinar: Exploring the Aggregation Framework
MongoDB
 
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
PDF
Webinar: Working with Graph Data in MongoDB
MongoDB
 
PPT
Introduction to MongoDB
Nosh Petigara
 
PDF
Webinar: Data Processing and Aggregation Options
MongoDB
 
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
PDF
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
PPTX
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
PDF
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
PPTX
Beyond the Basics 2: Aggregation Framework
MongoDB
 
PPTX
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
PPTX
Data Governance with JSON Schema
MongoDB
 
PPTX
Getting Started with MongoDB and NodeJS
MongoDB
 
PDF
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
MongoDB Aggregation
Amit Ghosh
 
Agg framework selectgroup feb2015 v2
MongoDB
 
Mongodb Aggregation Pipeline
zahid-mian
 
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB - Aggregation Pipeline
Jason Terpko
 
Webinar: Exploring the Aggregation Framework
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Introduction to MongoDB
Nosh Petigara
 
Webinar: Data Processing and Aggregation Options
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
Data Governance with JSON Schema
MongoDB
 
Getting Started with MongoDB and NodeJS
MongoDB
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 

Viewers also liked (20)

PDF
Mongo db aggregation guide
Deysi Gmarra
 
PPTX
MongoDB World 2016: MongoDB + Google Cloud
MongoDB
 
PDF
MongoDB World 2016: Number Crush
MongoDB
 
PDF
MongoDB World 2016: From the Polls to the Trolls: Seeing What the World Think...
MongoDB
 
PDF
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
PPTX
Be A Startup Not a F**kup
Joe Drumgoole
 
PPTX
EuroPython 2016 : A Deep Dive into the Pymongo Driver
Joe Drumgoole
 
PPTX
Enterprise mobility for fun and profit
Joe Drumgoole
 
PPTX
Back to Basics Webinar 1 - Introduction to NoSQL
Joe Drumgoole
 
PPTX
Cloud Computing - Halfway through the revolution
Joe Drumgoole
 
PPTX
Back to Basics Webinar 3 - Thinking in Documents
Joe Drumgoole
 
PPTX
Introduction to NoSQL
Joe Drumgoole
 
PPTX
Back to Basics Webinar 2 - Your First MongoDB Application
Joe Drumgoole
 
PPTX
Simplifying Enterprise Mobility - Powering Mobile Apps from The Cloud
Joe Drumgoole
 
PPTX
Event sourcing the best ubiquitous pattern you have never heard off
Joe Drumgoole
 
PPTX
Cloudsplit original
Joe Drumgoole
 
PPTX
Harness the web and grow your business
Joe Drumgoole
 
PPTX
Server discovery and monitoring with MongoDB
Joe Drumgoole
 
PPTX
Mobile monday mhealth
Joe Drumgoole
 
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
Mongo db aggregation guide
Deysi Gmarra
 
MongoDB World 2016: MongoDB + Google Cloud
MongoDB
 
MongoDB World 2016: Number Crush
MongoDB
 
MongoDB World 2016: From the Polls to the Trolls: Seeing What the World Think...
MongoDB
 
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
Be A Startup Not a F**kup
Joe Drumgoole
 
EuroPython 2016 : A Deep Dive into the Pymongo Driver
Joe Drumgoole
 
Enterprise mobility for fun and profit
Joe Drumgoole
 
Back to Basics Webinar 1 - Introduction to NoSQL
Joe Drumgoole
 
Cloud Computing - Halfway through the revolution
Joe Drumgoole
 
Back to Basics Webinar 3 - Thinking in Documents
Joe Drumgoole
 
Introduction to NoSQL
Joe Drumgoole
 
Back to Basics Webinar 2 - Your First MongoDB Application
Joe Drumgoole
 
Simplifying Enterprise Mobility - Powering Mobile Apps from The Cloud
Joe Drumgoole
 
Event sourcing the best ubiquitous pattern you have never heard off
Joe Drumgoole
 
Cloudsplit original
Joe Drumgoole
 
Harness the web and grow your business
Joe Drumgoole
 
Server discovery and monitoring with MongoDB
Joe Drumgoole
 
Mobile monday mhealth
Joe Drumgoole
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
Ad

Similar to MongoDB World 2016 : Advanced Aggregation (20)

PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
PPTX
mongodb-aggregation-may-2012
Chris Westin
 
PPTX
MongoDB's New Aggregation framework
Chris Westin
 
PPTX
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
PDF
Using MongoDB and Python
Mike Bright
 
PDF
2016 feb-23 pyugre-py_mongo
Michael Bright
 
PPTX
S01 e00 einfuehrung-in_mongodb
MongoDB
 
PPTX
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Avinash Kaza
 
PDF
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB
 
PPTX
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
PDF
MongoDB Meetup
Maxime Beugnet
 
PDF
Mongo db improve the performance of your application codemotion2016
Juan Antonio Roy Couto
 
PDF
MongoDB_Spark
Mat Keep
 
PDF
Mdb dn 2017_18_query_hackathon
Daniel M. Farrell
 
PDF
Querying Mongo Without Programming Using Funql
MongoDB
 
PDF
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
PPTX
Building your First MEAN App
MongoDB
 
PPTX
MongoDB 3.2 - Analytics
Massimo Brignoli
 
PDF
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB
 
PPTX
Architecting Wide-ranging Analytical Solutions with MongoDB
Matthew Kalan
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
mongodb-aggregation-may-2012
Chris Westin
 
MongoDB's New Aggregation framework
Chris Westin
 
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
Michael Bright
 
S01 e00 einfuehrung-in_mongodb
MongoDB
 
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Avinash Kaza
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
MongoDB Meetup
Maxime Beugnet
 
Mongo db improve the performance of your application codemotion2016
Juan Antonio Roy Couto
 
MongoDB_Spark
Mat Keep
 
Mdb dn 2017_18_query_hackathon
Daniel M. Farrell
 
Querying Mongo Without Programming Using Funql
MongoDB
 
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
Building your First MEAN App
MongoDB
 
MongoDB 3.2 - Analytics
Massimo Brignoli
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB
 
Architecting Wide-ranging Analytical Solutions with MongoDB
Matthew Kalan
 
Ad

More from Joe Drumgoole (15)

PPTX
MongoDB Schema Design
Joe Drumgoole
 
PPTX
The Rise of Microservices
Joe Drumgoole
 
PPTX
Back to Basics 2017 - Your First MongoDB Application
Joe Drumgoole
 
PPTX
Back to Basics 2017 - Introduction to NoSQL
Joe Drumgoole
 
PPTX
Introduction to CQRS and Event Sourcing
Joe Drumgoole
 
PPTX
How to run a company for 2k a year
Joe Drumgoole
 
PPTX
Internet Safety and Chldren
Joe Drumgoole
 
PPTX
The Future of IT for Accountants
Joe Drumgoole
 
PPTX
How to Run a Company for $2000 a Year
Joe Drumgoole
 
PPTX
Smart Phones - Smart Platforms
Joe Drumgoole
 
PPTX
Cloud Computing - A Gentle Introduction
Joe Drumgoole
 
PPTX
The costs of cloud computing
Joe Drumgoole
 
PPTX
A cheap date with cloud computing
Joe Drumgoole
 
PPTX
Software warstories mba-club
Joe Drumgoole
 
PPTX
Agile development using SCRUM
Joe Drumgoole
 
MongoDB Schema Design
Joe Drumgoole
 
The Rise of Microservices
Joe Drumgoole
 
Back to Basics 2017 - Your First MongoDB Application
Joe Drumgoole
 
Back to Basics 2017 - Introduction to NoSQL
Joe Drumgoole
 
Introduction to CQRS and Event Sourcing
Joe Drumgoole
 
How to run a company for 2k a year
Joe Drumgoole
 
Internet Safety and Chldren
Joe Drumgoole
 
The Future of IT for Accountants
Joe Drumgoole
 
How to Run a Company for $2000 a Year
Joe Drumgoole
 
Smart Phones - Smart Platforms
Joe Drumgoole
 
Cloud Computing - A Gentle Introduction
Joe Drumgoole
 
The costs of cloud computing
Joe Drumgoole
 
A cheap date with cloud computing
Joe Drumgoole
 
Software warstories mba-club
Joe Drumgoole
 
Agile development using SCRUM
Joe Drumgoole
 

Recently uploaded (20)

PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 

MongoDB World 2016 : Advanced Aggregation

  • 1. Advanced MongoDB Aggregation Joe Drumgoole Director of Developer Advocacy, EMEA @jdrumgoole
  • 2. #MDBW16 MongoDB Aggregation Framework ‱ One of the least understood parts of MongoDB ‱ Many customers never get to the aggregation framework ‱ Had limited utility 2.6 (16MB limit) ‱ The aggregation framework “grew up” in 3.0 ‱ We continue to enhance (3.2 and now 3..4)
  • 3. #MDBW16 A quick reminder ‱ A Processing Pipeline ‱ Design to process large groups of documents in parallel ‱ Is shard aware ‱ Can create new data from old
  • 4. #MDBW16 A Typical Pipeline Match Project Group Sort Find Query Select Fields Combine Fields Rename fields Calculate Group By Execute accumulators Rename fields Sort results
  • 5. #MDBW16 Description of Data Set ‱ MOT : Ministry of Transport, then ‱ VOSA : Vehicle and Operator Services Agency and then ‱ DVSA : Driver and Vehicle Services Agency ‱ The test is still called the MOT Test ‱ It is a test of road worthiness ‱ Nearly every motorised vehicle must pass an MOT every year ‱ New cars are exempt for three years
  • 6. #MDBW16 Introduction to the MOT Public Data Set
  • 7. #MDBW16 This is a big Collection > db.test_results.count() 253472477 > 253 million records
  • 8. #MDBW16 Example Document { "_id" : ObjectId("5759ee6e8684975e1098af68"), "TestID" : 400, "VehicleID" : "278", "TestDate" : ISODate("2013-04-23T00:00:00Z"), "TestClassID" : "4", "TestType" : "N", "TestResult" : "P", "TestMileage" : 99284, "Postcode" : "E", "Make" : "AUDI", "Model" : "A3", "Colour" : "BLACK", "FuelType" : "P", "CylinderCapacity" : 1598, "FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }
  • 9. #MDBW16 We will work with one year : 2013 > db.results_2013.count() 37390457 Still 37.3 million records
  • 10. #MDBW16 Use $match to filter cars = { "$match" : { "TestClassID" : { "$eq" : "4" }}} motorcycles = { "$match" : { "$or" : [ { "TestClassID" : "1" }, { "TestClassID" : "2" } ] } }
  • 11. #MDBW16 Lets make a collection of Cars removeNulls = { "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} carsonly = { "$match" : { "TestClassID" : “4”}} output = { “$out” : “cars_2013” } db.results_2013.aggregate( [ removeNulls, carsonly, output ] )
  • 12. #MDBW16 Now lets create a Summary For each car of a given make and age: ‱ The total number of cars ‱ The average mileage for this collection of cars ‱ The total number of passes
  • 13. #MDBW16 We $project to get the data we want ageinusecs ={ "$subtract" :[ "$TestDate", "$FirstUseDate”]} ageinyears = { "$divide" :[ ageinusecs,(1000*3600*24*365)]} floorage = { "$floor" : ageinyears } ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}
  • 14. #MDBW16 The actual $projection project = { "$project" : { "Make” :1, "VehicleID" :1, "TestResult” :1, "TestDate” :1, "TestMileage” :1, "FirstUseDate” :1, "Age” :floorage, "pass” :ispass }}
  • 15. #MDBW16 We Then $group to get the accumulated values group = { "$group" : { "_id" : { "make": "$Make", "age" : "$Age" }, "count" : {"$sum":1} , "miles” : {"$avg":"$TestMileage"}, "passes": {"$sum":"$pass” }}}
  • 16. #MDBW16 Now put it all in pipeline db.results_2013.aggregate([project,group,out]) out = { "$out" : ”cars_summary" } The $out operator creates a new collection
  • 17. #MDBW16 Restrictions ‱ 16MB result limit if not using cursors or $out ‱ 100MB in memory limit unless $allowDiskUse is specified ‱ $geoNear must be the first operator in a pipeline ‱ $out must be the last operator in the pipeline
  • 18. #MDBW16 Market Size $36 Billion Partners 1,000+ International Offices 15 Global Employees 575+ Downloads Worldwide 15,000,000+ Make a GIANT Impact www.mongodb.com/careers

Editor's Notes

  • #7: 34 GB uncompressed
  • #11: Early matching can use indexes to reduce input set efficiently. $Match usings same syntax as the find() function.