SlideShare a Scribd company logo
1
What problem are we solving?
• Map/Reduce can be used for aggregation…
  • Currently being used for totaling, averaging, etc
• Map/Reduce is a big hammer
  • Simpler tasks should be easier
    • Shouldn’t need to write JavaScript
    • Avoid the overhead of JavaScript engine
• We’re seeing requests for help in handling
  complex documents
  • Select only matching subdocuments or arrays
How will we solve the problem?
• Our new aggregation framework
  • Declarative framework
    • No JavaScript required
  • Describe a chain of operations to apply
  • Expression evaluation
    • Return computed values
  • Framework: we can add new operations easily
  • C++ implementation
    • Higher performance than JavaScript
Aggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
  are passed through a pipeline to produce a
  result
  • Similar to a command-line pipe
Pipeline Operations
• $match
  • Uses a query predicate (like .find({…})) as a filter
• $project
  • Uses a sample document to determine the shape
    of the result (similar to .find()’s optional argument)
    • This can include computed values
• $unwind
  • Hands out array elements one at a time
• $group
  • Aggregates items into buckets defined by a key
Pipeline Operations (continued)
• $sort
  • Sort documents
• $limit
  • Only allow the specified number of documents to
    pass
• $skip
  • Skip over the specified number of documents
Projections
• $project can reshape results
  • Include or exclude fields
  • Computed fields
    • Arithmetic expressions, including built-in functions
    • Pull fields from nested documents to the top
    • Push fields from the top down into new virtual
      documents
Unwinding
• $unwind can “stream” arrays
  • Array values are doled out one at time in the
    context of their surrounding documents
  • Makes it possible to filter out elements before
    returning
Grouping
• $group aggregation expressions
  • Define a grouping key as the _id of the result
  • Total grouped column values: $sum
  • Average grouped column values: $avg
  • Collect grouped column values in an array or set:
    $push, $addToSet
  • Other functions
    • $min, $max, $first, $last
Sorting
• $sort can sort documents
  • Sort specifications are the same as today, e.g.,
    $sort:{ key1: 1, key2: -1, …}
Computed Expressions
• Available in $project operations
• Prefix expression language
  • Add two fields: $add:[“$field1”, “$field2”]
  • Provide a value for a missing field:
    $ifNull:[“$field1”, “$field2”]
  • Nesting: $add:[“$field1”, $ifNull:[“$field2”,
    “$field3”]]
  • Other functions….
    • And we can easily add more as required
Computed Expressions (continued)
• String functions
  • toUpper, toLower, substr
• Date field extraction
  • Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
  Oracle nvl())
• Ternary conditional
  • Return one of two values based on a predicate
Demo
Demo files are at https://gist.github.com/1401585
Usage Tips
• Use $match in a pipeline as early as possible
  • The query optimizer can then choose to scan an
    index and avoid scanning the entire collection
• Use $sort in a pipeline as early as possible
  • The query optimizer can then be used to choose
    an index to scan instead of sorting the result
Driver Support
• Initial version is a command
  • For any language, build a JSON database object,
    and execute the command
    • In the shell: db.runCommand({ aggregate :
      <collection-name>, pipeline : {…} });
  • Beware of command result size limit
    • Document size limit is 16MB
Sharding support
• Initial release will support sharding
• Mongos analyzes pipeline, and forwards
  operations up to $group or $sort to shards;
  combines shard server results and returns
  them
When is this being released?
• In final development now
  • Adding an explain facility
• Expect to see this in the near future
Future Plans
• More optimizations
• $out pipeline operation
  • Saves the document stream to a collection
  • Similar to M/R $out, but with sharded output
  • Functions like a tee, so that intermediate results
    can be saved
mongodb-aggregation-may-2012

More Related Content

What's hot (20)

PDF
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
PDF
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 
PPTX
SAX
Tilakpoudel2
 
PPTX
Query handlingbytheserver
sqlserver.co.il
 
PPT
Asp #2
Joni
 
PPTX
Things you can find in the plan cache
sqlserver.co.il
 
PDF
Data centric Metaprogramming by Vlad Ulreche
Spark Summit
 
PDF
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
PDF
You got schema in my json
Philipp Fehre
 
PDF
Getting started with influx Db and Grafana Installation Guide
Soumil Shahsoumil
 
PDF
Up and Running with the Typelevel Stack
Luka Jacobowitz
 
PPTX
Utilizing the OpenNTF Domino API
Oliver Busse
 
PDF
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
PDF
Head first latex
Chung-Hsiang Ofa Hsueh
 
PDF
Hadoop spark online demo
Tripti Jha
 
PDF
Towards sql for streams
Radu Tudoran
 
PDF
Streaming SQL with Apache Calcite
Julian Hyde
 
PDF
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
PPTX
Entity framework
Rajeev Harbola
 
PPT
SQL on Big Data using Optiq
Julian Hyde
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 
Query handlingbytheserver
sqlserver.co.il
 
Asp #2
Joni
 
Things you can find in the plan cache
sqlserver.co.il
 
Data centric Metaprogramming by Vlad Ulreche
Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
You got schema in my json
Philipp Fehre
 
Getting started with influx Db and Grafana Installation Guide
Soumil Shahsoumil
 
Up and Running with the Typelevel Stack
Luka Jacobowitz
 
Utilizing the OpenNTF Domino API
Oliver Busse
 
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
Head first latex
Chung-Hsiang Ofa Hsueh
 
Hadoop spark online demo
Tripti Jha
 
Towards sql for streams
Radu Tudoran
 
Streaming SQL with Apache Calcite
Julian Hyde
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
Entity framework
Rajeev Harbola
 
SQL on Big Data using Optiq
Julian Hyde
 

Similar to mongodb-aggregation-may-2012 (20)

PPTX
MongoDB's New Aggregation framework
Chris Westin
 
PDF
Using MongoDB and Python
Mike Bright
 
PDF
2016 feb-23 pyugre-py_mongo
Michael Bright
 
PDF
MongoDB Aggregation Framework
Caserta
 
PPTX
The Aggregation Framework
MongoDB
 
PPT
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
PDF
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
PPTX
Aggregation Presentation for databses (1).pptx
plvdravikumarit
 
PDF
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
PPTX
MongoDB Aggregation
Amit Ghosh
 
PPTX
The Aggregation Framework
MongoDB
 
PDF
Querying Mongo Without Programming Using Funql
MongoDB
 
PDF
Full metal mongo
Israel Gutiérrez
 
PPTX
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
PDF
RedisConf18 - Introducing RediSearch Aggregations
Redis Labs
 
PPTX
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
PDF
MongoDB_Spark
Mat Keep
 
PDF
MongoDB FabLab León
Juan Antonio Roy Couto
 
PDF
Data as Documents: Overview and intro to MongoDB
Mitch Pirtle
 
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
MongoDB's New Aggregation framework
Chris Westin
 
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
Michael Bright
 
MongoDB Aggregation Framework
Caserta
 
The Aggregation Framework
MongoDB
 
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Aggregation Presentation for databses (1).pptx
plvdravikumarit
 
Precog & MongoDB User Group: Skyrocket Your Analytics
MongoDB
 
MongoDB Aggregation
Amit Ghosh
 
The Aggregation Framework
MongoDB
 
Querying Mongo Without Programming Using Funql
MongoDB
 
Full metal mongo
Israel Gutiérrez
 
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
RedisConf18 - Introducing RediSearch Aggregations
Redis Labs
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
MongoDB_Spark
Mat Keep
 
MongoDB FabLab León
Juan Antonio Roy Couto
 
Data as Documents: Overview and intro to MongoDB
Mitch Pirtle
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Ad

More from Chris Westin (20)

PDF
Data torrent meetup-productioneng
Chris Westin
 
PDF
Gripshort
Chris Westin
 
PPTX
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
 
PDF
Cluster management and automation with cloudera manager
Chris Westin
 
PDF
Building low latency java applications with ehcache
Chris Westin
 
PDF
SDN/OpenFlow #lspe
Chris Westin
 
ODP
cfengine3 at #lspe
Chris Westin
 
PDF
Nimbula lspe-2012-04-19
Chris Westin
 
PPTX
mongodb-brief-intro-february-2012
Chris Westin
 
PDF
Stingray - Riverbed Technology
Chris Westin
 
PPTX
Replication and replica sets
Chris Westin
 
PPTX
Architecting a Scale Out Cloud Storage Solution
Chris Westin
 
PPTX
FlashCache
Chris Westin
 
PPTX
Large Scale Cacti
Chris Westin
 
PPTX
MongoDB: An Introduction - July 2011
Chris Westin
 
PPTX
Practical Replication June-2011
Chris Westin
 
PPTX
MongoDB: An Introduction - june-2011
Chris Westin
 
PPT
Ganglia Overview-v2
Chris Westin
 
ODP
Mysql Proxy Presentation Yahoo
Chris Westin
 
ODP
Mysql proxy presentation_yahoo
Chris Westin
 
Data torrent meetup-productioneng
Chris Westin
 
Gripshort
Chris Westin
 
Ambari hadoop-ops-meetup-2013-09-19.final
Chris Westin
 
Cluster management and automation with cloudera manager
Chris Westin
 
Building low latency java applications with ehcache
Chris Westin
 
SDN/OpenFlow #lspe
Chris Westin
 
cfengine3 at #lspe
Chris Westin
 
Nimbula lspe-2012-04-19
Chris Westin
 
mongodb-brief-intro-february-2012
Chris Westin
 
Stingray - Riverbed Technology
Chris Westin
 
Replication and replica sets
Chris Westin
 
Architecting a Scale Out Cloud Storage Solution
Chris Westin
 
FlashCache
Chris Westin
 
Large Scale Cacti
Chris Westin
 
MongoDB: An Introduction - July 2011
Chris Westin
 
Practical Replication June-2011
Chris Westin
 
MongoDB: An Introduction - june-2011
Chris Westin
 
Ganglia Overview-v2
Chris Westin
 
Mysql Proxy Presentation Yahoo
Chris Westin
 
Mysql proxy presentation_yahoo
Chris Westin
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

mongodb-aggregation-may-2012

  • 1. 1
  • 2. What problem are we solving? • Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc • Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine • We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arrays
  • 3. How will we solve the problem? • Our new aggregation framework • Declarative framework • No JavaScript required • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: we can add new operations easily • C++ implementation • Higher performance than JavaScript
  • 4. Aggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Conceptually, the members of a collection are passed through a pipeline to produce a result • Similar to a command-line pipe
  • 5. Pipeline Operations • $match • Uses a query predicate (like .find({…})) as a filter • $project • Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) • This can include computed values • $unwind • Hands out array elements one at a time • $group • Aggregates items into buckets defined by a key
  • 6. Pipeline Operations (continued) • $sort • Sort documents • $limit • Only allow the specified number of documents to pass • $skip • Skip over the specified number of documents
  • 7. Projections • $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions, including built-in functions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documents
  • 8. Unwinding • $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returning
  • 9. Grouping • $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $last
  • 10. Sorting • $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}
  • 11. Computed Expressions • Available in $project operations • Prefix expression language • Add two fields: $add:[“$field1”, “$field2”] • Provide a value for a missing field: $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • And we can easily add more as required
  • 12. Computed Expressions (continued) • String functions • toUpper, toLower, substr • Date field extraction • Get year, month, day, hour, etc, from ISODate • Date arithmetic • Null value substitution (like MySQL ifnull(), Oracle nvl()) • Ternary conditional • Return one of two values based on a predicate
  • 13. Demo Demo files are at https://gist.github.com/1401585
  • 14. Usage Tips • Use $match in a pipeline as early as possible • The query optimizer can then choose to scan an index and avoid scanning the entire collection • Use $sort in a pipeline as early as possible • The query optimizer can then be used to choose an index to scan instead of sorting the result
  • 15. Driver Support • Initial version is a command • For any language, build a JSON database object, and execute the command • In the shell: db.runCommand({ aggregate : <collection-name>, pipeline : {…} }); • Beware of command result size limit • Document size limit is 16MB
  • 16. Sharding support • Initial release will support sharding • Mongos analyzes pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them
  • 17. When is this being released? • In final development now • Adding an explain facility • Expect to see this in the near future
  • 18. Future Plans • More optimizations • $out pipeline operation • Saves the document stream to a collection • Similar to M/R $out, but with sharded output • Functions like a tee, so that intermediate results can be saved