SlideShare a Scribd company logo
Solutions Architect, MongoDB
Jay Runkel
@jayrunkel
Time Series Data – Part 1
Schema Design
Our Mission Today
We need to prepare for this
Develop Nationwide traffic monitoring
system
MongoDB for Time Series Data: Schema Design
Traffic sensors to monitor interstate
conditions
• 16,000 sensors
• Measure
• Speed
• Travel time
• Weather, pavement, and traffic conditions
• Support desktop, mobile, and car navigation
systems
Model After NY State Solution
Other requirements
• Need to keep 3 year history
• Three data centers
• NJ, Chicago, LA
• Need to support 5M simultaneous users
• Peak volume (rush hour)
• Every minute, each request the 10 minute average
speed for 50 sensors
Master Agenda
• Successfully deploy a MongoDB application at
scale
• Use case: traffic data
• Presentation Components
1. Schema Design
2. Aggregation
3. ClusterArchitecture
Time Series Data Schema
Design
Agenda
• Similarities between MongoDB and Olympic
weight lifting
• What is time series data?
• Schema design considerations
• Analysis of alternative schemas
• Questions
Before we get started…
MongoDB for Time Series Data: Schema Design
Lifting heavy things requires
• Technique
• Planning
• Practice
• Analysis
• Tuning
Without planning…
MongoDB for Time Series Data: Schema Design
Tailor your schema to your
application workload
Time Series
A time series is a sequence of data points, measured
typically at successive points in time spaced at
uniform time intervals.
– Wikipedia
0 2 4 6 8 10 12
time
Time Series Data is Everywhere
• Free hosted service for monitoring MongoDB systems
– 100+ system metrics visualized and alerted
• 25,000+ MongoDB systems submitting data every 60
seconds
• 90% updates, 10% reads
• ~75,000 updates/second
• ~5.4B operations/day
• 8 commodity servers
Example: MongoDB Monitoring Service
Time Series Data is Everywhere
Application Requirements
Event Resolution
Analysis
– Dashboards
– Analytics
– Reporting
Data Retention Policies
Event and Query Volumes
Schema Design
Aggregation Queries
Cluster Architecture
Schema Design
Considerations
Schema Design Goal
Store Event Data
SupportAnalytical Queries
Find best compromise of:
– Memory utilization
– Write performance
– Read/Analytical Query Performance
Accomplish with realistic amount of hardware
Designing For Reading, Writing, …
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Document Per Event
{
segId: “I80_mile23”,
speed: 63,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload
Document Per Minute (Average)
{
segId: “I80_mile23”,
speed_num: 18,
speed_sum: 1134,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
Document Per Minute (By Second)
{
segId: “I80_mile23”,
speed: { 0: 63, 1: 58, …, 58: 66, 59: 64 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
Document Per Hour (By Second)
{
segId: “I80_mile23”,
speed: { 0: 63, 1: 58, …, 3598: 45, 3599: 55 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
Document Per Hour (By Second)
{
segId: “I80_mile23”,
speed: {
0: {0: 47, …, 59: 45},
….
59: {0: 65, …, 59: 66}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
Characterizing Write Differences
• Example: data generated every second
• For 1 minute:
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Document Per Event
60 writes
Document Per Minute
1 write, 59 updates
Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
Document Per Event
3600 reads
Document Per Minute
60 reads
Characterizing Memory Differences
• _id index for 1 billion events:
• _id index plus segId and ts index:
• Memory requirements significantly reduced
– Fewer shards
– Lower capacity servers
Document Per Event
~32 GB
Document Per Minute
~.5 GB
Document Per Event
~100 GB
Document Per Minute
~2 GB
Traffic Monitoring System
Schema
Quick Analysis
Writes
– 16,000 sensors, 1 update per minute
– 16,000 / 60 = 267 updates per second
Reads
– 5M simultaneous users
– Each requests data for 50 sensors per minute
Tailor your schema to your
application workload
Reads: Impact of Alternative
Schemas
10 minute average query
Schema 1 sensor 50 sensors
1 doc per event 10 500
1 doc per 10 min 1.9 95
1 doc per hour 1.3 65
Query: Find the average speed over the
last
ten minutes
10 minute average query with 5M
users
Schema ops/sec
1 doc per event 42M
1 doc per 10 min 8M
1 doc per hour 5.4M
Writes: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema Inserts Updates
doc/event 60 0
doc/10 min 6 54
doc/hour 1 59
16000 Sensors – 1 Day
Schema Inserts Updates
doc/event 23M 0
doc/10 min 2.3M 21M
doc/hour .38M 22.7M
Queries will require two indexes
{
“segId” : “20484097”,
”ts" : ISODate(“2013-10-10T23:06:37.000Z”),
”time" : "237",
"speed" : "52",
“pavement”: “Wet Spots”,
“status” : “Wet Conditions”,
“weather” : “Light Rain”
}
~70 bytes per document
Memory: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema
# of
Documents
Index Size
(bytes)
doc/event 60 4200
doc/10 min 6 420
doc/hour 1 70
16000 Sensors – 1 Day
Schema
# of
Documents Index Size
doc/event 23M 1.3 GB
doc/10 min 2.3M 131 MB
doc/hour .38M 1.4 MB
Tailor your schema to your
application workload
Summary
• Tailor your schema to your application workload
• Aggregating events will
– Improve write performance: inserts  updates
– Improve analytics performance: fewer document reads
– Reduce index size  reduce memory requirements
Questions?
@jayrunkel
jay.runkel@mongodb.com
Part 2 – July 9th, 2:00 PM EST
Part 3 - July 16th, 2:00 PM EST

More Related Content

What's hot (17)

PPTX
Cloud computing ... simple
Solongo Munkhjargal
 
PDF
Azure SQL Database
nj-azure
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Digital Transformation EXPO Event Series
 
PDF
Big Data Architecture and Design Patterns
John Yeung
 
PDF
AWS Elastic Beanstalk Tutorial | AWS Certification | AWS Tutorial | Edureka
Edureka!
 
DOC
Data Mining
ksanthosh
 
PDF
Gestão Ágil de Dados com Enterprise Data Fabric
Denodo
 
PPTX
Batch Processing vs Stream Processing Difference
jeetendra mandal
 
PDF
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Edureka!
 
PPTX
Data models in NoSQL
Dr-Dipali Meher
 
PDF
The Evolution of AutoML
Ning Jiang
 
PPTX
[BEDROCK] Claude Prompt Engineering Techniques.pptx
ssuserdd71c7
 
PPT
cloud storage
obrita youkhane
 
PPTX
AI-900 Slides.pptx
kprasad8
 
PDF
Test Data Management and Its Role in DevOps
TechWell
 
PPTX
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Cloud computing ... simple
Solongo Munkhjargal
 
Azure SQL Database
nj-azure
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Digital Transformation EXPO Event Series
 
Big Data Architecture and Design Patterns
John Yeung
 
AWS Elastic Beanstalk Tutorial | AWS Certification | AWS Tutorial | Edureka
Edureka!
 
Data Mining
ksanthosh
 
Gestão Ágil de Dados com Enterprise Data Fabric
Denodo
 
Batch Processing vs Stream Processing Difference
jeetendra mandal
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Edureka!
 
Data models in NoSQL
Dr-Dipali Meher
 
The Evolution of AutoML
Ning Jiang
 
[BEDROCK] Claude Prompt Engineering Techniques.pptx
ssuserdd71c7
 
cloud storage
obrita youkhane
 
AI-900 Slides.pptx
kprasad8
 
Test Data Management and Its Role in DevOps
TechWell
 
An Introduction To NoSQL & MongoDB
Lee Theobald
 

Viewers also liked (6)

PPTX
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB
 
PPTX
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
PPTX
MongoDB for Time Series Data
MongoDB
 
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
PDF
Ppt. types of quantitative research
Nursing Path
 
PPSX
Experimental research design
Nursing Path
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
MongoDB for Time Series Data
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Ppt. types of quantitative research
Nursing Path
 
Experimental research design
Nursing Path
 
Ad

Similar to MongoDB for Time Series Data: Schema Design (20)

PPTX
Mongo db 2.4 time series data - Brignoli
Codemotion
 
PPTX
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
PPTX
Cloud Security Monitoring and Spark Analytics
amesar0
 
PPTX
MongoDB Best Practices
Lewis Lin 🦊
 
PPTX
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
PPTX
Performance Monitoring for the Cloud - Java2Days 2017
Werner Keil
 
PPTX
MongoDB and the Internet of Things
Sam_Francis
 
PPT
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
Fatima Qayyum
 
PPTX
Codemotion Milano 2014 - MongoDB and the Internet of Things
Massimo Brignoli
 
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
PDF
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
PDF
Architecture for Scale [AppFirst]
AppFirst
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PDF
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
PPTX
Sizing MongoDB Clusters
MongoDB
 
PDF
Webinar: SQL for Machine Data?
Crate.io
 
PPTX
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB
 
PDF
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
PDF
Design and Implementation of A Data Stream Management System
Erdi Olmezogullari
 
Mongo db 2.4 time series data - Brignoli
Codemotion
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
Cloud Security Monitoring and Spark Analytics
amesar0
 
MongoDB Best Practices
Lewis Lin 🦊
 
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
Performance Monitoring for the Cloud - Java2Days 2017
Werner Keil
 
MongoDB and the Internet of Things
Sam_Francis
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
Fatima Qayyum
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Massimo Brignoli
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
Architecture for Scale [AppFirst]
AppFirst
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
Sizing MongoDB Clusters
MongoDB
 
Webinar: SQL for Machine Data?
Crate.io
 
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB
 
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Design and Implementation of A Data Stream Management System
Erdi Olmezogullari
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 

MongoDB for Time Series Data: Schema Design

  • 1. Solutions Architect, MongoDB Jay Runkel @jayrunkel Time Series Data – Part 1 Schema Design
  • 3. We need to prepare for this
  • 4. Develop Nationwide traffic monitoring system
  • 6. Traffic sensors to monitor interstate conditions • 16,000 sensors • Measure • Speed • Travel time • Weather, pavement, and traffic conditions • Support desktop, mobile, and car navigation systems
  • 7. Model After NY State Solution
  • 8. Other requirements • Need to keep 3 year history • Three data centers • NJ, Chicago, LA • Need to support 5M simultaneous users • Peak volume (rush hour) • Every minute, each request the 10 minute average speed for 50 sensors
  • 9. Master Agenda • Successfully deploy a MongoDB application at scale • Use case: traffic data • Presentation Components 1. Schema Design 2. Aggregation 3. ClusterArchitecture
  • 10. Time Series Data Schema Design
  • 11. Agenda • Similarities between MongoDB and Olympic weight lifting • What is time series data? • Schema design considerations • Analysis of alternative schemas • Questions
  • 12. Before we get started…
  • 14. Lifting heavy things requires • Technique • Planning • Practice • Analysis • Tuning
  • 17. Tailor your schema to your application workload
  • 18. Time Series A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. – Wikipedia 0 2 4 6 8 10 12 time
  • 19. Time Series Data is Everywhere
  • 20. • Free hosted service for monitoring MongoDB systems – 100+ system metrics visualized and alerted • 25,000+ MongoDB systems submitting data every 60 seconds • 90% updates, 10% reads • ~75,000 updates/second • ~5.4B operations/day • 8 commodity servers Example: MongoDB Monitoring Service
  • 21. Time Series Data is Everywhere
  • 22. Application Requirements Event Resolution Analysis – Dashboards – Analytics – Reporting Data Retention Policies Event and Query Volumes Schema Design Aggregation Queries Cluster Architecture
  • 24. Schema Design Goal Store Event Data SupportAnalytical Queries Find best compromise of: – Memory utilization – Write performance – Read/Analytical Query Performance Accomplish with realistic amount of hardware
  • 25. Designing For Reading, Writing, … • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  • 26. Document Per Event { segId: “I80_mile23”, speed: 63, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload
  • 27. Document Per Minute (Average) { segId: “I80_mile23”, speed_num: 18, speed_sum: 1134, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  • 28. Document Per Minute (By Second) { segId: “I80_mile23”, speed: { 0: 63, 1: 58, …, 58: 66, 59: 64 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  • 29. Document Per Hour (By Second) { segId: “I80_mile23”, speed: { 0: 63, 1: 58, …, 3598: 45, 3599: 55 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  • 30. Document Per Hour (By Second) { segId: “I80_mile23”, speed: { 0: {0: 47, …, 59: 45}, …. 59: {0: 65, …, 59: 66} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  • 31. Characterizing Write Differences • Example: data generated every second • For 1 minute: • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits Document Per Event 60 writes Document Per Minute 1 write, 59 updates
  • 32. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks Document Per Event 3600 reads Document Per Minute 60 reads
  • 33. Characterizing Memory Differences • _id index for 1 billion events: • _id index plus segId and ts index: • Memory requirements significantly reduced – Fewer shards – Lower capacity servers Document Per Event ~32 GB Document Per Minute ~.5 GB Document Per Event ~100 GB Document Per Minute ~2 GB
  • 35. Quick Analysis Writes – 16,000 sensors, 1 update per minute – 16,000 / 60 = 267 updates per second Reads – 5M simultaneous users – Each requests data for 50 sensors per minute
  • 36. Tailor your schema to your application workload
  • 37. Reads: Impact of Alternative Schemas 10 minute average query Schema 1 sensor 50 sensors 1 doc per event 10 500 1 doc per 10 min 1.9 95 1 doc per hour 1.3 65 Query: Find the average speed over the last ten minutes 10 minute average query with 5M users Schema ops/sec 1 doc per event 42M 1 doc per 10 min 8M 1 doc per hour 5.4M
  • 38. Writes: Impact of alternative schemas 1 Sensor - 1 Hour Schema Inserts Updates doc/event 60 0 doc/10 min 6 54 doc/hour 1 59 16000 Sensors – 1 Day Schema Inserts Updates doc/event 23M 0 doc/10 min 2.3M 21M doc/hour .38M 22.7M
  • 39. Queries will require two indexes { “segId” : “20484097”, ”ts" : ISODate(“2013-10-10T23:06:37.000Z”), ”time" : "237", "speed" : "52", “pavement”: “Wet Spots”, “status” : “Wet Conditions”, “weather” : “Light Rain” } ~70 bytes per document
  • 40. Memory: Impact of alternative schemas 1 Sensor - 1 Hour Schema # of Documents Index Size (bytes) doc/event 60 4200 doc/10 min 6 420 doc/hour 1 70 16000 Sensors – 1 Day Schema # of Documents Index Size doc/event 23M 1.3 GB doc/10 min 2.3M 131 MB doc/hour .38M 1.4 MB
  • 41. Tailor your schema to your application workload
  • 42. Summary • Tailor your schema to your application workload • Aggregating events will – Improve write performance: inserts  updates – Improve analytics performance: fewer document reads – Reduce index size  reduce memory requirements
  • 43. Questions? @jayrunkel [email protected] Part 2 – July 9th, 2:00 PM EST Part 3 - July 16th, 2:00 PM EST