OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time flows, my friend
Managing event sequences and time series with a
Document-Graph Database
Codemotion Milan 2014
Luigi Dell’Aquila
Orient Technologies LTD
Twitter: @ldellaquila

Time What…?
Time series:
A time series is a sequence of data points, typically
consisting of successive measurements made over a
time interval (Wikipedia)

Time What…?
Event sequences:
• A set of events with a timestamp
• A set of relationships “happened
before/after”
• Cause and effect relationships

Time What…?
Time as a dimension:
• Direct:
– Eg. begin and end of relationships (I’m a
friend of John since…)
• Calculated
– Eg. Speed (distance/time)

Time What…?
Time as a constraint:
• Query execution time!

The problem:
Fast and Effective

Fast and Effective
Fast write: Time doesn’t wait! Writes just arrive
Fast read: a lot of data to be read in a short time
Effective manipulation: complex operations like
- Aggregation
- Prediction
- Analysis

Current approaches
0. Relational approach: table
Timestamp Value
2014:11:21 14:35:00 1321
2014:11:21 14:35:01 2444
2014:11:21 14:35:02 2135
2014:11:21 14:35:03 1833

Current approaches
0. Relational approach: table
HH MM SS Value
14 35 0 1321
14 35 1 2444
14 35 2 2135
14 35 3 1833

Current approaches
0. Relational – Advantages
• Simple
• It can be used together with your application data
(operational)

Current approaches
0. Relational – Disadvantages
• Slow read (relies on an index)
• Slow insert (update the index…)

Current approaches
1. Document Database
• Collections of Documents instead of tables
• Schemaless
• Complex data structures

Current approaches
1. Document approach: Minute Based
{
timestamp: “2014-11-21 12.05“
load: [10, 15, 3, … 30] //array of 60, one per second
}

Current approaches
1. Document approach: Hour Based
{
timestamp: “2014-11-21 12.00“
load: {
0: [10, 15, 3, … 30], //array of 60, one per second
1: [0, 12, 31, … 24],
…
59: [10, 10, 1, … 16]
}
}

Current approaches
1. Document approach – Advantages
• Fast write: One insert x 60 updates
• Fast fetch

Current approaches
1. Document approach – Disadvantages
• Fixed time windows
• Single point per unit
• How to pre-aggregate?
• Relationships with the rest of the world?
• Relationships between events?

Current approaches
2. Graph Database
• Nodes/Edges instead of tables
• Index free adjacency
• Fast traversal
• Dynamic structure

Current approaches
2. Graph approach: linked sequence
e
1
next e
e
2
next e
next e
3
4
5
next
(timestamp on vertex)

Current approaches
2. Graph approach: linked sequence (tag
based)
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]

Current approaches
2. Graph approach: Hierarchy
e
1
e
2
e6
0
1
1
8
24
2 60 …
…
Days
Hours
Minutes
Seconds
…
e
3

Current approaches
2. Graph approach: mixed
e
1
e
2
e6
0
1
1
8
24
2 60 …
…
Days
Hours
Minutes
Seconds
…
e
3

Current approaches
1. Graph approach – Advantages
• Flexible
• Events can be connected together in different ways
• You can connect events to other entities
• Fast traversal of dynamic time windows
• Fast aggregation (based on hierarchy)

Current approaches
1. Graph approach – Disadvantages
• Slow writes (vertex + edge + maintenance)
• Not so fast reads

Can we mix different models and get
all the advantages?

Can we mix all this with the rest of
application logic?

• Document database (schema-free, complex
properties)
• Graph database (index-free adjacency, fast
traversal)
• SQL (extended)
• Operational (schema - ACID)
• OO concepts (Classes, inheritance, polymorphism)
• REST/JSON interface
• Native Javascript (extend query language, expose
services, event hooks)
• Distributed (Multi-master replica/sharding)
architecture

OrientDB
First step: put them together
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}

OrientDB
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!

OrientDB
1
8
24
Days
… Hours
{
0: {
0: 1000,
1: 1500,
…
59: 210
}
1: { … }
…
59: { … }
}
Graph
Document

Where should I stop?
It depends on my domain and
requirements.

OrientDB
Result:
• Same insert speed of Document approach
• But with flexibility of a Graph
• (as a side effect of mixing models,
documents can also contain “pointers” to
other elements of app domain)

OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
Graph

OrientDB
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
Graph
sum()

OrientDB
1
sum()
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
Graph
sum()

OrientDB
How to aggregate
Hooks: Server side triggers (Java or Javascript),
executed when DB operations happen (eg. Insert
or update)
Java interface:
Public RESULT onBeforeInsert(…);
public void onAfterInsert(…);
public RESULT onBeforeUpdate(…);
public void onAfterUpdate(…);

OrientDB
Aggregation logic
• Second 0 -> insert
• Second 1 -> update
• …
• Second 59 -> update + aggregate
– Write aggregate value on minute vertex
• Minute == 59? Calculate aggregate on hour vertex

OrientDB
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1,
1: 12.
…
59: 3
}
sum = 1000
sum = 15000
sum = 300
1 2
incomplete
complete
sum = null
sum = null

OrientDB
Query logic:
• Traverse from root node to specified level
(filtering based on vertex data)
• Is there aggregate value?
– Yes: return it
– No: go one level down and do the same
Aggregation on a level will be VERY fast if you
have horizontal edges!

OrientDB
How to calculate aggregate values with a query
Input params:
- Root node (suppose it is #11:11)
select sum(aggregateVal) from (
traverse out() from #11:11
while in().aggregateVal is null
)
With the same logic you can query based on time
windows

OrientDB
Third step: Complex domains
1
1 2 60 …
Hours
Minutes
{
0: {val: 1000},
1: {val: 1500}.
…
59: {
val: 96,
eventTags: [tag1, tag2]
…
}
}
Graph
Document <- Enrich the domain

OrientDB
Another use case: Event Categories and OO
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
nextTag3
e
3
[Tag3]

OrientDB
Another use case: Event Categories and OO
Suppose tags are hierarchical categories
(Classes for vertices and/or edges)
nextTAG
nextTagX nextTag3
nextTag1 nextTag2

OrientDB
Subset of events
TRAVERSE out(‘nextTag1’) FROM <e1>
e
1
e
2
nextTag1
e
4
nextTag1
e
5
nextTag1
[Tag1, Tag2]
[Tag1]

OrientDB
Subset of events
TRAVERSE out(‘nextTag2’) FROM <e1>
e
1
nextTag1
nextTag2 e
e
3
5
nextTag2
[Tag1, Tag2, Tag3]
[Tag1, Tag2]
[Tag2]

OrientDB
Subset of events (Polymorphic!!!)
TRAVERSE out(‘nextTagX’) FROM <e1>
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2]
[Tag1]
[Tag2]

Connect all this with the rest of your
application domain

You’ll see, everything will get more
complex: you will discover new time-related
dimensions (speed,
position…) and new needs (complex
forecasting)

Chase
• Your target is running away
• You have informers that track his moves
(coordinates in a point of time) and give
you additional (unstructured) information
• You have a street map
• You want to:
– Catch him ASAP
– Predict his moves
– Be sure that he is inside an area

Chase
• Map is made of points and distances
• You also have speed limits for streets
point1
pointN Distance: 1Km
Max speed: 70Km/h
Distance: 2Km
Max speed: 120Km/h
Distance: 8Km
Max speed: 90Km/h
Map point
Street

Chase
• Map is made of points and distances
• You also have speed limits for streets
• Distance / Speed = TIME!!!

Chase
You have a time series of your target’s moves
{
{
Timestamp: 29/11/2014 17:15:00
LAT: 19,12223
LON: 42,134
}
Timestamp: 29/11/2014 17:55:00
LAT: 19,12223
LON: 42,134
}
Event
Event seqence
{
Timestamp: 29/11/2014 17:55:00
LAT: 19,12223
LON: 42,134
}

Chase
21/11/2014
2:35:00 PM
20/11/2014
1:20:00 PM
Map point
Street

Chase
21/11/2014
14:35:00
20/11/2014
13:20:00
Event
Map point
Where
Event seqence
Street
29/11/2014
17:55:00

Chase
Vertices and edges are also documents
So you can store complex information inside them
{
timestamp: 22213989487987,
lat: xxxx,
lon: yyy,
informer: 15,
additional: {
speed: 120,
description: “the target was in a car”
car: {
model: “Fiat 500”,
licensePlate: “AA 123 BB”
}
}
}

Chase
Now you can:
• Predict his moves (eg. statistical methods,
interpolation on lat/lon + time)
• Calculate how far he can be (based on last
position, avg speed and street data)
• Reach him quickly (shortest path, Dijkstra)
• … intelligence?

Chase
But to have all this you need:
• An easy way for your informers to send
time series events
Hint: REST interface
With OrientDB you can expose Javascript
functions as REST services!

Chase
And you need:
• An extended query language
Eg.
TRAVERSE out(“street”) FROM (
SELECT out(“point”) FROM #11:11
// my last event
) WHILE canBeReached($current, #11:11)
(where he could be)

Chase
With OrientDB you can write
function canBeReached(node, event)
In Javascript and use it in your queries

Chase
It’s just a game, but think about:
• Fraud detection
• Traffic routing
• Multi-dimensional analytics
• Forecasting
• …

One model is not enough
One of most common issues of my customers
is:
“I have a zoo of technologies in my application
stack, and it’s getting worse every day”
My answer is: Multi-Model DB

One model is not enough
One of most common issues of my customers
is:
“I have a zoo of technologies in my application
stack, and it’s getting worse every day”
My answer is: Multi-Model DB
of course ;-)

From:
“choose the right data model for your
use case”
To:
“Your application has multiple data
models, you need all of them!”

Thank you!
@ldellaquila
l.dellaquila@orientechnologies.com

OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to OrientDB - Time Series and Event Sequences - Codemotion Milan 2014 (20)

More from Luigi Dell'Aquila (13)

Recently uploaded (20)

OrientDB - Time Series and Event Sequences - Codemotion Milan 2014