MongoDB for Time Series Data: Schema Design

Solutions Architect, MongoDB
Jay Runkel
@jayrunkel
Time Series Data – Part 1
Schema Design

Develop Nationwide traffic monitoring
system

Traffic sensors to monitor interstate
conditions
• 16,000 sensors
• Measure
• Speed
• Travel time
• Weather, pavement, and traffic conditions
• Support desktop, mobile, and car navigation
systems

Other requirements
• Need to keep 3 year history
• Three data centers
• NJ, Chicago, LA
• Need to support 5M simultaneous users
• Peak volume (rush hour)
• Every minute, each request the 10 minute average
speed for 50 sensors

Master Agenda
• Successfully deploy a MongoDB application at
scale
• Use case: traffic data
• Presentation Components
1. Schema Design
2. Aggregation
3. ClusterArchitecture

Time Series Data Schema
Design

Agenda
• Similarities between MongoDB and Olympic
weight lifting
• What is time series data?
• Schema design considerations
• Analysis of alternative schemas
• Questions

Lifting heavy things requires
• Technique
• Planning
• Practice
• Analysis
• Tuning

Tailor your schema to your
application workload

Time Series
A time series is a sequence of data points, measured
typically at successive points in time spaced at
uniform time intervals.
– Wikipedia
0 2 4 6 8 10 12
time

Time Series Data is Everywhere

• Free hosted service for monitoring MongoDB systems
– 100+ system metrics visualized and alerted
• 25,000+ MongoDB systems submitting data every 60
seconds
• 90% updates, 10% reads
• ~75,000 updates/second
• ~5.4B operations/day
• 8 commodity servers
Example: MongoDB Monitoring Service

Application Requirements
Event Resolution
Analysis
– Dashboards
– Analytics
– Reporting
Data Retention Policies
Event and Query Volumes
Schema Design
Aggregation Queries
Cluster Architecture

Schema Design Goal
Store Event Data
SupportAnalytical Queries
Find best compromise of:
– Memory utilization
– Write performance
– Read/Analytical Query Performance
Accomplish with realistic amount of hardware

Designing For Reading, Writing, …
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour

Document Per Event
{
segId: “I80_mile23”,
speed: 63,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload

Document Per Minute (Average)
{
speed_num: 18,
speed_sum: 1134,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level

Document Per Minute (By Second)
{
speed: { 0: 63, 1: 58, …, 58: 66, 59: 64 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Pre-allocate structure to avoid document moves

Document Per Hour (By Second)
{
speed: { 0: 63, 1: 58, …, 3598: 45, 3599: 55 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Updating last second requires 3599 steps

Document Per Hour (By Second)
{
speed: {
0: {0: 47, …, 59: 45},
….
59: {0: 65, …, 59: 66}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Updating last second requires 59+59 steps

Characterizing Write Differences
• Example: data generated every second
• For 1 minute:
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Document Per Event
60 writes
Document Per Minute
1 write, 59 updates

Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
Document Per Event
3600 reads
Document Per Minute
60 reads

Characterizing Memory Differences
• _id index for 1 billion events:
• _id index plus segId and ts index:
• Memory requirements significantly reduced
– Fewer shards
– Lower capacity servers
Document Per Event
~32 GB
Document Per Minute
~.5 GB
Document Per Event
~100 GB
Document Per Minute
~2 GB

Traffic Monitoring System
Schema

Quick Analysis
Writes
– 16,000 sensors, 1 update per minute
– 16,000 / 60 = 267 updates per second
Reads
– 5M simultaneous users
– Each requests data for 50 sensors per minute

Reads: Impact of Alternative
Schemas
10 minute average query
Schema 1 sensor 50 sensors
1 doc per event 10 500
1 doc per 10 min 1.9 95
1 doc per hour 1.3 65
Query: Find the average speed over the
last
ten minutes
10 minute average query with 5M
users
Schema ops/sec
1 doc per event 42M
1 doc per 10 min 8M
1 doc per hour 5.4M

Writes: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema Inserts Updates
doc/event 60 0
doc/10 min 6 54
doc/hour 1 59
16000 Sensors – 1 Day
Schema Inserts Updates
doc/event 23M 0
doc/10 min 2.3M 21M
doc/hour .38M 22.7M

Queries will require two indexes
{
“segId” : “20484097”,
”ts" : ISODate(“2013-10-10T23:06:37.000Z”),
”time" : "237",
"speed" : "52",
“pavement”: “Wet Spots”,
“status” : “Wet Conditions”,
“weather” : “Light Rain”
}
~70 bytes per document

Memory: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema
# of
Documents
Index Size
(bytes)
doc/event 60 4200
doc/10 min 6 420
doc/hour 1 70
16000 Sensors – 1 Day
Schema
# of
Documents Index Size
doc/event 23M 1.3 GB
doc/10 min 2.3M 131 MB
doc/hour .38M 1.4 MB

Summary
• Tailor your schema to your application workload
• Aggregating events will
– Improve write performance: inserts  updates
– Improve analytics performance: fewer document reads
– Reduce index size  reduce memory requirements

Questions?
@jayrunkel
jay.runkel@mongodb.com
Part 2 – July 9th, 2:00 PM EST
Part 3 - July 16th, 2:00 PM EST

MongoDB for Time Series Data: Schema Design

More Related Content

What's hot (17)

Viewers also liked (6)

Similar to MongoDB for Time Series Data: Schema Design (20)

More from MongoDB (20)

Recently uploaded (20)

MongoDB for Time Series Data: Schema Design