SlideShare a Scribd company logo
Time Series Data in MongoDB
Senior Solutions Architect, MongoDB Inc.
Massimo Brignoli
#mongodb
Agenda
• What is time series data?
• Schema design considerations
• Broader use case: operational intelligence
• MMS Monitoring schema design
• Thinking ahead
• Questions
What is time series data?
Time Series Data is Everywhere
• Financial markets pricing (stock ticks)
• Sensors (temperature, pressure, proximity)
• Industrial fleets (location, velocity, operational)
• Social networks (status updates)
• Mobile devices (calls, texts)
• Systems (server logs, application logs)
Time Series Data at a Higher Level
• Widely applicable data model
• Applies to several different “data use cases”
• Various schema and modeling options
• Application requirements drive schema design
Time Series Data Considerations
• Resolution of raw events
• Resolution needed to support
– Applications
– Analysis
– Reporting
• Data retention policies
– Data ages out
– Retention
Schema Design
Considerations
Designing For Writing and Reading
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Document Per Event
{
server: “server1”,
load: 92,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload
• Aggregations computed at application-level
Document Per Minute (Average)
{
server: “server1”,
load_num: 92,
load_sum: 4500,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
Document Per Minute (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
Document Per Hour (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
Document Per Hour (By Second)
{
server: “server1”,
load: {
0: {0: 15, …, 59: 45},
….
59: {0: 25, …, 59: 75}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
Characterzing Write Differences
• Example: data generated every second
• Capturing data per minute requires:
– Document per event: 60 writes
– Document per minute: 1 write, 59 updates
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
– Document per event: 3600 reads
– Document per minute: 60 reads
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
MMS Monitoring Schema
Design
MMS Monitoring
• MongoDB Management System Monitoring
• Available in two flavors
– Free cloud-hosted monitoring
– On-premise with MongoDB Enterprise
• Monitor single node, replica set, or sharded cluster
deployments
• Metric dashboards and custom alert triggers
MMS Monitoring
MMS Monitoring
MMS Application Requirements
Resolution defines granularity of
stored data
Range controls the retention
policy, e.g. after 24 hours only 5-
minute resolution
Display dictates the stored pre-
aggregations, e.g. total and count
Monitoring Schema Design
• Per-minute documentmodel
• Documentsstore individual metrics and counts
• Supports“total” and “avg/sec”display
{
timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”),
num_samples: 58,
total_samples: 108000000,
type: “memory_used”,
values: {
0: 999999,
…
59: 1800000
}
}
Monitoring Data Updates
• Single update required to add new data and
increment associated counts
db.metrics.update(
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”
},
{
{$set: {“values.59”: 2000000 }},
{$inc: {num_samples: 1, total_samples: 2000000 }}
}
)
Monitoring Data Management
• Data stored at different granularity levels for read
performance
• Collections are organized into specific intervals
• Retention is managed by simply dropping
collections as they age out
• Document structure is pre-created to maximize write
performance
Use Case: Operational
Intelligence
What is Operational Intelligence
• Storing log data
– Capturing application and/or server generated events
• Hierarchical aggregation
– Rolling approach to generate rollups
– e.g. hourly > daily > weekly > monthly
• Pre-aggregated reports
– Processing data to generate reporting from raw events
Storing Log Data
{
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
user: 'frank',
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
request: "GET /apache_pb.gif HTTP/1.0",
status: 200,
response_size: 2326,
referrer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
"[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en]
(Win98; I ;Nav)”
Pre-Aggregation
• Analytics across raw events can involve many reads
• Alternative schemas can improve read and write
performance
• Data can be organized into more coarse buckets
• Transition from insert-driven to update-driven
workloads
Pre-Aggregated Log Data
{
timestamp_minute: ISODate("2000-10-10T20:55:00Z"),
resource: "/index.html",
page_views: {
0: 50,
…
59: 250
}
}
• Leverage time-seriesstyle bucketing
• Trackindividual metrics (ex. page views)
• Improve performancefor reads/writes
• Minimal processingoverhead
Hierarchical Aggregation
• Analytical approach as opposed to schema
approach
– Leverage built-inAggregation Framework or MapReduce
• Execute multiple tasks sequentially to aggregate at
varying levels
• Raw events  Hourly  Weekly  Monthly
• Rolling approach distributes the aggregation
workload
Thinking Ahead
Before You Start
• What are the application requirements?
• Is pre-aggregation useful for your application?
• What are your retention and age-out policies?
• What are the gotchas?
– Pre-create document structure to avoid fragmentation and
performance problems
– Organize your data for growth – time series data grows
fast!
Down The Road
• Scale-out considerations
– Vertical vs. horizontal (with sharding)
• Understanding the data
– Aggregation
– Analytics
– Reporting
• Deeper data analysis
– Patterns
– Predictions
Scaling Time Series Data in
MongoDB
• Vertical growth
– Larger instances with more CPU and memory
– Increased storage capacity
• Horizontal growth
– Partitioning data across many machines
– Dividing and distributing the workload
Time Series Sharding
Considerations
• What are the application requirements?
– Primarily collecting data
– Primarily reporting data
– Both
• Map those back to
– Write performance needs
– Read/write query distribution
– Collection organization (see MMS Monitoring)
• Example: {metric name, coarse timestamp}
Aggregates, Analytics, Reporting
• Aggregation Framework can be used for analysis
– Does it work with the chosen schema design?
– What sorts of aggregations are needed?
• Reporting can be done on predictable, rolling basis
– See “HierarchicalAggregation”
• Consider secondary reads for analytical operations
– Minimize load on production primaries
Deeper Data Analysis
• Leverage MongoDB-Hadoop connector
– Bi-directional support for reading/writing
– Works with online and offline data (e.g. backup files)
• Compute using MapReduce
– Patterns
– Recommendations
– Etc.
• Explore data
– Pig
– Hive
Questions?
Resources
• Schema Design for Time Series Data in MongoDB
http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-
data-in-mongodb
• Operational Intelligence Use Case
http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence
• Data Modeling in MongoDB
http://docs.mongodb.org/manual/data-modeling/
• Schema Design (webinar)
http://www.mongodb.com/events/webinar/schema-design-oct2013

More Related Content

Similar to Mongo db 2.4 time series data - Brignoli (20)

PPTX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB
 
PDF
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
festival ICT 2016
 
PDF
MongoDB Solution for Internet of Things and Big Data
Stefano Dindo
 
PPTX
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
PDF
Analytic Data Report with MongoDB
Li Jia Li
 
PPTX
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB
 
PPTX
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
PPTX
MongoDB Best Practices
Lewis Lin 🦊
 
PDF
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
PPTX
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB
 
PPTX
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
PDF
Mongodb meetup
Eytan Daniyalzade
 
PPTX
Webinar: Utilisations courantes de MongoDB
MongoDB
 
PDF
Making Sense of Time Series Data in MongoDB
MongoDB
 
PPTX
Codemotion Milano 2014 - MongoDB and the Internet of Things
Massimo Brignoli
 
PPTX
Webinar: Scaling MongoDB
MongoDB
 
PPTX
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPTX
An Enterprise Architect's View of MongoDB
MongoDB
 
PPTX
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB
 
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
festival ICT 2016
 
MongoDB Solution for Internet of Things and Big Data
Stefano Dindo
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
Analytic Data Report with MongoDB
Li Jia Li
 
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB Best Practices
Lewis Lin 🦊
 
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
Mongodb meetup
Eytan Daniyalzade
 
Webinar: Utilisations courantes de MongoDB
MongoDB
 
Making Sense of Time Series Data in MongoDB
MongoDB
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Massimo Brignoli
 
Webinar: Scaling MongoDB
MongoDB
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
An Enterprise Architect's View of MongoDB
MongoDB
 
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
PPTX
Pastore - Commodore 65 - La storia
Codemotion
 
PPTX
Pennisi - Essere Richard Altwasser
Codemotion
 
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Ad

Recently uploaded (20)

PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Python basic programing language for automation
DanialHabibi2
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
July Patch Tuesday
Ivanti
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Ad

Mongo db 2.4 time series data - Brignoli

  • 1. Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb
  • 2. Agenda • What is time series data? • Schema design considerations • Broader use case: operational intelligence • MMS Monitoring schema design • Thinking ahead • Questions
  • 3. What is time series data?
  • 4. Time Series Data is Everywhere • Financial markets pricing (stock ticks) • Sensors (temperature, pressure, proximity) • Industrial fleets (location, velocity, operational) • Social networks (status updates) • Mobile devices (calls, texts) • Systems (server logs, application logs)
  • 5. Time Series Data at a Higher Level • Widely applicable data model • Applies to several different “data use cases” • Various schema and modeling options • Application requirements drive schema design
  • 6. Time Series Data Considerations • Resolution of raw events • Resolution needed to support – Applications – Analysis – Reporting • Data retention policies – Data ages out – Retention
  • 8. Designing For Writing and Reading • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  • 9. Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload • Aggregations computed at application-level
  • 10. Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  • 11. Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  • 12. Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  • 13. Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  • 14. Characterzing Write Differences • Example: data generated every second • Capturing data per minute requires: – Document per event: 60 writes – Document per minute: 1 write, 59 updates • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits
  • 15. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: – Document per event: 3600 reads – Document per minute: 60 reads • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks
  • 17. MMS Monitoring • MongoDB Management System Monitoring • Available in two flavors – Free cloud-hosted monitoring – On-premise with MongoDB Enterprise • Monitor single node, replica set, or sharded cluster deployments • Metric dashboards and custom alert triggers
  • 20. MMS Application Requirements Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5- minute resolution Display dictates the stored pre- aggregations, e.g. total and count
  • 21. Monitoring Schema Design • Per-minute documentmodel • Documentsstore individual metrics and counts • Supports“total” and “avg/sec”display { timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } }
  • 22. Monitoring Data Updates • Single update required to add new data and increment associated counts db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } )
  • 23. Monitoring Data Management • Data stored at different granularity levels for read performance • Collections are organized into specific intervals • Retention is managed by simply dropping collections as they age out • Document structure is pre-created to maximize write performance
  • 25. What is Operational Intelligence • Storing log data – Capturing application and/or server generated events • Hierarchical aggregation – Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly • Pre-aggregated reports – Processing data to generate reporting from raw events
  • 26. Storing Log Data { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" } 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”
  • 27. Pre-Aggregation • Analytics across raw events can involve many reads • Alternative schemas can improve read and write performance • Data can be organized into more coarse buckets • Transition from insert-driven to update-driven workloads
  • 28. Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } } • Leverage time-seriesstyle bucketing • Trackindividual metrics (ex. page views) • Improve performancefor reads/writes • Minimal processingoverhead
  • 29. Hierarchical Aggregation • Analytical approach as opposed to schema approach – Leverage built-inAggregation Framework or MapReduce • Execute multiple tasks sequentially to aggregate at varying levels • Raw events  Hourly  Weekly  Monthly • Rolling approach distributes the aggregation workload
  • 31. Before You Start • What are the application requirements? • Is pre-aggregation useful for your application? • What are your retention and age-out policies? • What are the gotchas? – Pre-create document structure to avoid fragmentation and performance problems – Organize your data for growth – time series data grows fast!
  • 32. Down The Road • Scale-out considerations – Vertical vs. horizontal (with sharding) • Understanding the data – Aggregation – Analytics – Reporting • Deeper data analysis – Patterns – Predictions
  • 33. Scaling Time Series Data in MongoDB • Vertical growth – Larger instances with more CPU and memory – Increased storage capacity • Horizontal growth – Partitioning data across many machines – Dividing and distributing the workload
  • 34. Time Series Sharding Considerations • What are the application requirements? – Primarily collecting data – Primarily reporting data – Both • Map those back to – Write performance needs – Read/write query distribution – Collection organization (see MMS Monitoring) • Example: {metric name, coarse timestamp}
  • 35. Aggregates, Analytics, Reporting • Aggregation Framework can be used for analysis – Does it work with the chosen schema design? – What sorts of aggregations are needed? • Reporting can be done on predictable, rolling basis – See “HierarchicalAggregation” • Consider secondary reads for analytical operations – Minimize load on production primaries
  • 36. Deeper Data Analysis • Leverage MongoDB-Hadoop connector – Bi-directional support for reading/writing – Works with online and offline data (e.g. backup files) • Compute using MapReduce – Patterns – Recommendations – Etc. • Explore data – Pig – Hive
  • 38. Resources • Schema Design for Time Series Data in MongoDB http://blog.mongodb.org/post/65517193370/schema-design-for-time-series- data-in-mongodb • Operational Intelligence Use Case http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence • Data Modeling in MongoDB http://docs.mongodb.org/manual/data-modeling/ • Schema Design (webinar) http://www.mongodb.com/events/webinar/schema-design-oct2013