SlideShare a Scribd company logo
MongoDB’s New Aggregation FeaturesChris Westin© Copyright 2010 10gen Inc.
What problem are we solving?Map/Reduce can be used for aggregation…Currently being used for totaling, averaging, etcMap/Reduce is a big hammerSimpler tasks should be easierShouldn’t need to write JavaScriptAvoid the overhead of JavaScript engineWe’re seeing requests for help in handling complex documentsSelect only subdocuments or arrays
How will we solve the problem?Our new aggregation frameworkDeclarative frameworkNo JavaScript requiredDescribe a chain of operations to applyExpression evaluationReturn computed valuesFramework:  we can add new operations easilyC++ implementationHigher performance than JavaScript
Aggregation - PipelinesAggregation requests specify a pipelineA pipeline is a series of operationsConceptually, the members of a collection are passed through a pipeline to produce a resultSimilar to a command-line pipe
Pipeline Operations$matchUses a query predicate (like .find({…})) as a filter$projectUses a sample document to determine the shape of the result (similar to .find()’s optional argument)This can include computed values$groupAggregates items into buckets defined by a key
Computed ExpressionsAvailable in $project operationsPrefix expression languageAdd two fields:  $add:[“$field1”, “$field2”]Provide a value for a missing field: $ifnull:[“$field1”, “$field2”]Nesting:  $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]Other functions….And we can easily add more as required
Projections$project can reshape results$unwind expression doles out array values one at a timePull fields from nested documents to the topPush fields from the top down into new virtual documents
Grouping$group aggregation expressionsTotal of column values:  $sumAverage of column values: $avgCollect column values in an array:  $push
Demo(See script at https://gist.github.com/993733)
Usage TipsUse $match in a pipeline as early as possibleThe query optimizer can then be used to choose an index and avoid scanning the entire collection
Driver SupportInitial version is a commandFor any language, build a JSON database object, and execute the command{ aggregate : <collection>, pipeline : {…} }Beware of command result size limit
When is this being released?In final development nowExpect to see this in the near future
Sharding supportInitial release will support shardingMongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
Pipeline Operations – Future Plans$sortSorts the document stream according to a key$outSaves the document stream to a collectionSimilar to M/R $out, but with sharded output
Expressions – Future PlansDate field extractionGet year, month, day, hour, etc, from DateDate arithmetic
MongoDB Aggregation MongoSF May 2011

More Related Content

What's hot (20)

PDF
Data Processing with Cascading Java API on Apache Hadoop
Hikmat Dhamee
 
PPTX
Introduction to ELK
Harshakumar Ummerpillai
 
PDF
Machine Learning in a Twitter ETL using ELK
hypto
 
PDF
Updating materialized views and caches using kafka
Zach Cox
 
PPTX
RethinkDB - the open-source database for the realtime web
Alex Ivanov
 
PDF
Building data flows with Celery and SQLAlchemy
Roger Barnes
 
KEY
EG Reports - Delicious Data
Benjamin Shum
 
PPTX
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
 
PPTX
MongoDB
Ganesh Kunwar
 
PDF
Replicating application data into materialized views
Zach Cox
 
PDF
9.4json
Andrew Dunstan
 
PPTX
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
PDF
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
 
PPTX
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
PPTX
Get docs from sp doc library
Sudip Sengupta
 
KEY
Mongo db admin_20110316
radiocats
 
PPTX
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
ODP
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
 
PDF
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
Data Processing with Cascading Java API on Apache Hadoop
Hikmat Dhamee
 
Introduction to ELK
Harshakumar Ummerpillai
 
Machine Learning in a Twitter ETL using ELK
hypto
 
Updating materialized views and caches using kafka
Zach Cox
 
RethinkDB - the open-source database for the realtime web
Alex Ivanov
 
Building data flows with Celery and SQLAlchemy
Roger Barnes
 
EG Reports - Delicious Data
Benjamin Shum
 
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
 
MongoDB
Ganesh Kunwar
 
Replicating application data into materialized views
Zach Cox
 
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Startit
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
Get docs from sp doc library
Sudip Sengupta
 
Mongo db admin_20110316
radiocats
 
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
MongoDB San Francisco DrupalCon 2010
Karoly Negyesi
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 

Viewers also liked (11)

PDF
Практическое применение MongoDB Aggregation Framework
Денис Кравченко
 
PPTX
MongoDB's New Aggregation framework
Chris Westin
 
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
PDF
Web Design Trends 2011
Vitaly Friedman
 
PPTX
Sharding
MongoDB
 
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
PPTX
Agg framework selectgroup feb2015 v2
MongoDB
 
PDF
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
 
PPTX
The Aggregation Framework
MongoDB
 
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
PDF
Grid FS
Chris Powers
 
Практическое применение MongoDB Aggregation Framework
Денис Кравченко
 
MongoDB's New Aggregation framework
Chris Westin
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Web Design Trends 2011
Vitaly Friedman
 
Sharding
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Agg framework selectgroup feb2015 v2
MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
 
The Aggregation Framework
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Grid FS
Chris Powers
 
Ad

Similar to MongoDB Aggregation MongoSF May 2011 (20)

PDF
Using MongoDB and Python
Mike Bright
 
PDF
2016 feb-23 pyugre-py_mongo
Michael Bright
 
PPTX
The Aggregation Framework
MongoDB
 
PDF
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
PDF
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
PDF
MongoDB Aggregation Framework
Caserta
 
PPTX
Aggregation Presentation for databses (1).pptx
plvdravikumarit
 
PPTX
MongoDB Aggregation
Amit Ghosh
 
PPTX
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
PPTX
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
ODP
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
PDF
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Henrik Ingo
 
PDF
2012 mongo db_bangalore_roadmap_new
MongoDB
 
PPTX
Learning MongoDB Aggregations in 10 Minutes
techprane
 
PPTX
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
PDF
MongoDB FabLab León
Juan Antonio Roy Couto
 
PPTX
Aggregation in MongoDB
Kishor Parkhe
 
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
PPT
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
Michael Bright
 
The Aggregation Framework
MongoDB
 
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
MongoDB Aggregation Framework
Caserta
 
Aggregation Presentation for databses (1).pptx
plvdravikumarit
 
MongoDB Aggregation
Amit Ghosh
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Henrik Ingo
 
2012 mongo db_bangalore_roadmap_new
MongoDB
 
Learning MongoDB Aggregations in 10 Minutes
techprane
 
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
MongoDB FabLab León
Juan Antonio Roy Couto
 
Aggregation in MongoDB
Kishor Parkhe
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
Ad

More from Chris Westin (20)

PDF
Data torrent meetup-productioneng
Chris Westin
 
PDF
Gripshort
Chris Westin
 
PPTX
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
 
PDF
Cluster management and automation with cloudera manager
Chris Westin
 
PDF
Building low latency java applications with ehcache
Chris Westin
 
PDF
SDN/OpenFlow #lspe
Chris Westin
 
ODP
cfengine3 at #lspe
Chris Westin
 
PDF
Nimbula lspe-2012-04-19
Chris Westin
 
PPTX
mongodb-brief-intro-february-2012
Chris Westin
 
PDF
Stingray - Riverbed Technology
Chris Westin
 
PPTX
Replication and replica sets
Chris Westin
 
PPTX
Architecting a Scale Out Cloud Storage Solution
Chris Westin
 
PPTX
FlashCache
Chris Westin
 
PPTX
Large Scale Cacti
Chris Westin
 
PPTX
MongoDB: An Introduction - July 2011
Chris Westin
 
PPTX
Practical Replication June-2011
Chris Westin
 
PPTX
MongoDB: An Introduction - june-2011
Chris Westin
 
PPT
Ganglia Overview-v2
Chris Westin
 
ODP
Mysql Proxy Presentation Yahoo
Chris Westin
 
ODP
Mysql proxy presentation_yahoo
Chris Westin
 
Data torrent meetup-productioneng
Chris Westin
 
Gripshort
Chris Westin
 
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
 
Cluster management and automation with cloudera manager
Chris Westin
 
Building low latency java applications with ehcache
Chris Westin
 
SDN/OpenFlow #lspe
Chris Westin
 
cfengine3 at #lspe
Chris Westin
 
Nimbula lspe-2012-04-19
Chris Westin
 
mongodb-brief-intro-february-2012
Chris Westin
 
Stingray - Riverbed Technology
Chris Westin
 
Replication and replica sets
Chris Westin
 
Architecting a Scale Out Cloud Storage Solution
Chris Westin
 
FlashCache
Chris Westin
 
Large Scale Cacti
Chris Westin
 
MongoDB: An Introduction - July 2011
Chris Westin
 
Practical Replication June-2011
Chris Westin
 
MongoDB: An Introduction - june-2011
Chris Westin
 
Ganglia Overview-v2
Chris Westin
 
Mysql Proxy Presentation Yahoo
Chris Westin
 
Mysql proxy presentation_yahoo
Chris Westin
 

Recently uploaded (20)

PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 

MongoDB Aggregation MongoSF May 2011

  • 1. MongoDB’s New Aggregation FeaturesChris Westin© Copyright 2010 10gen Inc.
  • 2. What problem are we solving?Map/Reduce can be used for aggregation…Currently being used for totaling, averaging, etcMap/Reduce is a big hammerSimpler tasks should be easierShouldn’t need to write JavaScriptAvoid the overhead of JavaScript engineWe’re seeing requests for help in handling complex documentsSelect only subdocuments or arrays
  • 3. How will we solve the problem?Our new aggregation frameworkDeclarative frameworkNo JavaScript requiredDescribe a chain of operations to applyExpression evaluationReturn computed valuesFramework: we can add new operations easilyC++ implementationHigher performance than JavaScript
  • 4. Aggregation - PipelinesAggregation requests specify a pipelineA pipeline is a series of operationsConceptually, the members of a collection are passed through a pipeline to produce a resultSimilar to a command-line pipe
  • 5. Pipeline Operations$matchUses a query predicate (like .find({…})) as a filter$projectUses a sample document to determine the shape of the result (similar to .find()’s optional argument)This can include computed values$groupAggregates items into buckets defined by a key
  • 6. Computed ExpressionsAvailable in $project operationsPrefix expression languageAdd two fields: $add:[“$field1”, “$field2”]Provide a value for a missing field: $ifnull:[“$field1”, “$field2”]Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]Other functions….And we can easily add more as required
  • 7. Projections$project can reshape results$unwind expression doles out array values one at a timePull fields from nested documents to the topPush fields from the top down into new virtual documents
  • 8. Grouping$group aggregation expressionsTotal of column values: $sumAverage of column values: $avgCollect column values in an array: $push
  • 9. Demo(See script at https://gist.github.com/993733)
  • 10. Usage TipsUse $match in a pipeline as early as possibleThe query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 11. Driver SupportInitial version is a commandFor any language, build a JSON database object, and execute the command{ aggregate : <collection>, pipeline : {…} }Beware of command result size limit
  • 12. When is this being released?In final development nowExpect to see this in the near future
  • 13. Sharding supportInitial release will support shardingMongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
  • 14. Pipeline Operations – Future Plans$sortSorts the document stream according to a key$outSaves the document stream to a collectionSimilar to M/R $out, but with sharded output
  • 15. Expressions – Future PlansDate field extractionGet year, month, day, hour, etc, from DateDate arithmetic