SlideShare a Scribd company logo
Time flows, my friend 
Managing event sequences and time series with a 
Document-Graph Database 
Codemotion Milan 2014 
Luigi Dell’Aquila 
Orient Technologies LTD 
Twitter: @ldellaquila
Time What…?
Time What…? 
Time series: 
A time series is a sequence of data points, typically 
consisting of successive measurements made over a 
time interval (Wikipedia)
Time What…? 
Event sequences: 
• A set of events with a timestamp 
• A set of relationships “happened 
before/after” 
• Cause and effect relationships
Time What…? 
Time as a dimension: 
• Direct: 
– Eg. begin and end of relationships (I’m a 
friend of John since…) 
• Calculated 
– Eg. Speed (distance/time)
Time What…? 
Time as a constraint: 
• Query execution time!
The problem: 
Fast and Effective
Fast and Effective 
Fast write: Time doesn’t wait! Writes just arrive 
Fast read: a lot of data to be read in a short time 
Effective manipulation: complex operations like 
- Aggregation 
- Prediction 
- Analysis
Current approaches
Current approaches 
0. Relational approach: table 
Timestamp Value 
2014:11:21 14:35:00 1321 
2014:11:21 14:35:01 2444 
2014:11:21 14:35:02 2135 
2014:11:21 14:35:03 1833
Current approaches 
0. Relational approach: table 
HH MM SS Value 
14 35 0 1321 
14 35 1 2444 
14 35 2 2135 
14 35 3 1833
Current approaches 
0. Relational – Advantages 
• Simple 
• It can be used together with your application data 
(operational)
Current approaches 
0. Relational – Disadvantages 
• Slow read (relies on an index) 
• Slow insert (update the index…)
Current approaches 
1. Document Database 
• Collections of Documents instead of tables 
• Schemaless 
• Complex data structures
Current approaches 
1. Document approach: Minute Based 
{ 
timestamp: “2014-11-21 12.05“ 
load: [10, 15, 3, … 30] //array of 60, one per second 
}
Current approaches 
1. Document approach: Hour Based 
{ 
timestamp: “2014-11-21 12.00“ 
load: { 
0: [10, 15, 3, … 30], //array of 60, one per second 
1: [0, 12, 31, … 24], 
… 
59: [10, 10, 1, … 16] 
} 
}
Current approaches 
1. Document approach – Advantages 
• Fast write: One insert x 60 updates 
• Fast fetch
Current approaches 
1. Document approach – Disadvantages 
• Fixed time windows 
• Single point per unit 
• How to pre-aggregate? 
• Relationships with the rest of the world? 
• Relationships between events?
Current approaches 
2. Graph Database 
• Nodes/Edges instead of tables 
• Index free adjacency 
• Fast traversal 
• Dynamic structure
Current approaches 
2. Graph approach: linked sequence 
e 
1 
next e 
e 
2 
next e 
next e 
3 
4 
5 
next 
(timestamp on vertex)
Current approaches 
2. Graph approach: linked sequence (tag 
based) 
e 
1 
e 
2 
nextTag1 
e 
3 
nextTag2 
e 
4 
nextTag1 
e 
5 
nextTag1 
nextTag2 
[Tag1, Tag2] [Tag1] 
[Tag1, Tag2] 
[Tag1] 
[Tag2]
Current approaches 
2. Graph approach: Hierarchy 
e 
1 
e 
2 
e6 
0 
1 
1 
8 
24 
2 60 … 
… 
Days 
Hours 
Minutes 
Seconds 
… 
e 
3
Current approaches 
2. Graph approach: mixed 
e 
1 
e 
2 
e6 
0 
1 
1 
8 
24 
2 60 … 
… 
Days 
Hours 
Minutes 
Seconds 
… 
e 
3
Current approaches 
1. Graph approach – Advantages 
• Flexible 
• Events can be connected together in different ways 
• You can connect events to other entities 
• Fast traversal of dynamic time windows 
• Fast aggregation (based on hierarchy)
Current approaches 
1. Graph approach – Disadvantages 
• Slow writes (vertex + edge + maintenance) 
• Not so fast reads
Can we mix different models and get 
all the advantages?
Can we mix all this with the rest of 
application logic?
Multi-Model!
• Document database (schema-free, complex 
properties) 
• Graph database (index-free adjacency, fast 
traversal) 
• SQL (extended) 
• Operational (schema - ACID) 
• OO concepts (Classes, inheritance, polymorphism) 
• REST/JSON interface 
• Native Javascript (extend query language, expose 
services, event hooks) 
• Distributed (Multi-master replica/sharding) 
architecture
OrientDB 
First step: put them together 
1 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1000, 
1: 1500. 
… 
59: 96 
}
OrientDB 
First step: put them together 
1 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1000, 
1: 1500. 
… 
59: 96 
} 
Graph 
Document <- IT’S A VERTEX TOO!!!
OrientDB 
First step: put them together 
1 
8 
24 
Days 
… Hours 
{ 
0: { 
0: 1000, 
1: 1500, 
… 
59: 210 
} 
1: { … } 
… 
59: { … } 
} 
Graph 
Document
Where should I stop? 
It depends on my domain and 
requirements.
OrientDB 
Result: 
• Same insert speed of Document approach 
• But with flexibility of a Graph 
• (as a side effect of mixing models, 
documents can also contain “pointers” to 
other elements of app domain)
OrientDB 
Second step: Pre-aggregate 
1 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1000, 
1: 1500. 
… 
59: 96 
} 
Graph 
Document <- IT’S A VERTEX TOO!!!
OrientDB 
Second step: Pre-aggregate 
1 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1000, 
1: 1500. 
… 
59: 96 
} 
Graph 
sum() 
Document <- IT’S A VERTEX TOO!!!
OrientDB 
Second step: Pre-aggregate 
1 
sum() 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1000, 
1: 1500. 
… 
59: 96 
} 
Graph 
sum() 
Document <- IT’S A VERTEX TOO!!!
OrientDB 
How to aggregate 
Hooks: Server side triggers (Java or Javascript), 
executed when DB operations happen (eg. Insert 
or update) 
Java interface: 
Public RESULT onBeforeInsert(…); 
public void onAfterInsert(…); 
public RESULT onBeforeUpdate(…); 
public void onAfterUpdate(…);
OrientDB 
Aggregation logic 
• Second 0 -> insert 
• Second 1 -> update 
• … 
• Second 57 -> update 
• Second 58 -> update 
• Second 59 -> update + aggregate 
– Write aggregate value on minute vertex 
• Minute == 59? Calculate aggregate on hour vertex
OrientDB 
1 
1 
8 
24 
2 60 … 
Days 
Hours 
Minutes 
… 
{ 
0: 1, 
1: 12. 
… 
59: 3 
} 
sum = 1000 
sum = 15000 
sum = 300 
1 2 
incomplete 
complete 
sum = null 
sum = null
OrientDB 
Query logic: 
• Traverse from root node to specified level 
(filtering based on vertex data) 
• Is there aggregate value? 
– Yes: return it 
– No: go one level down and do the same 
Aggregation on a level will be VERY fast if you 
have horizontal edges!
OrientDB 
How to calculate aggregate values with a query 
Input params: 
- Root node (suppose it is #11:11) 
select sum(aggregateVal) from ( 
traverse out() from #11:11 
while in().aggregateVal is null 
) 
With the same logic you can query based on time 
windows
OrientDB 
Third step: Complex domains 
1 
1 2 60 … 
Hours 
Minutes 
{ 
0: {val: 1000}, 
1: {val: 1500}. 
… 
59: { 
val: 96, 
eventTags: [tag1, tag2] 
… 
} 
} 
Graph 
Document <- Enrich the domain
OrientDB 
Another use case: Event Categories and OO 
e 
1 
e 
2 
nextTag1 
e 
3 
nextTag2 
e 
4 
nextTag1 
e 
5 
nextTag1 
nextTag2 
[Tag1, Tag2, Tag3] [Tag1] 
[Tag1, Tag2] 
[Tag1] 
[Tag2] 
nextTag3 
e 
3 
[Tag3]
OrientDB 
Another use case: Event Categories and OO 
Suppose tags are hierarchical categories 
(Classes for vertices and/or edges) 
nextTAG 
nextTagX nextTag3 
nextTag1 nextTag2
OrientDB 
Subset of events 
TRAVERSE out(‘nextTag1’) FROM <e1> 
e 
1 
e 
2 
nextTag1 
e 
4 
nextTag1 
e 
5 
nextTag1 
[Tag1, Tag2, Tag3] [Tag1] 
[Tag1, Tag2] 
[Tag1]
OrientDB 
Subset of events 
TRAVERSE out(‘nextTag2’) FROM <e1> 
e 
1 
nextTag1 
nextTag2 e 
e 
3 
5 
nextTag2 
[Tag1, Tag2, Tag3] 
[Tag1, Tag2] 
[Tag2]
OrientDB 
Subset of events (Polymorphic!!!) 
TRAVERSE out(‘nextTagX’) FROM <e1> 
e 
1 
e 
2 
nextTag1 
e 
3 
nextTag2 
e 
4 
nextTag1 
e 
5 
nextTag1 
nextTag2 
[Tag1, Tag2, Tag3] [Tag1] 
[Tag1, Tag2] 
[Tag1] 
[Tag2]
Connect all this with the rest of your 
application domain
You’ll see, everything will get more 
complex: you will discover new time-related 
dimensions (speed, 
position…) and new needs (complex 
forecasting)
CHASE!
Chase 
• Your target is running away 
• You have informers that track his moves 
(coordinates in a point of time) and give 
you additional (unstructured) information 
• You have a street map 
• You want to: 
– Catch him ASAP 
– Predict his moves 
– Be sure that he is inside an area
Chase
Chase
Chase 
• Map is made of points and distances 
• You also have speed limits for streets 
point1 
pointN Distance: 1Km 
Max speed: 70Km/h 
Distance: 2Km 
Max speed: 120Km/h 
Distance: 8Km 
Max speed: 90Km/h 
Map point 
Street
Chase 
• Map is made of points and distances 
• You also have speed limits for streets 
• Distance / Speed = TIME!!!
Chase 
You have a time series of your target’s moves 
{ 
{ 
Timestamp: 29/11/2014 17:15:00 
LAT: 19,12223 
LON: 42,134 
} 
Timestamp: 29/11/2014 17:55:00 
LAT: 19,12223 
LON: 42,134 
} 
Event 
Event seqence 
{ 
Timestamp: 29/11/2014 17:55:00 
LAT: 19,12223 
LON: 42,134 
}
Chase 
You have a time series of your target’s moves 
21/11/2014 
2:35:00 PM 
20/11/2014 
1:20:00 PM 
Map point 
Street
Chase 
You have a time series of your target’s moves 
21/11/2014 
14:35:00 
20/11/2014 
13:20:00 
Event 
Map point 
Where 
Event seqence 
Street 
29/11/2014 
17:55:00
Chase 
Vertices and edges are also documents 
So you can store complex information inside them 
{ 
timestamp: 22213989487987, 
lat: xxxx, 
lon: yyy, 
informer: 15, 
additional: { 
speed: 120, 
description: “the target was in a car” 
car: { 
model: “Fiat 500”, 
licensePlate: “AA 123 BB” 
} 
} 
}
Chase 
Now you can: 
• Predict his moves (eg. statistical methods, 
interpolation on lat/lon + time) 
• Calculate how far he can be (based on last 
position, avg speed and street data) 
• Reach him quickly (shortest path, Dijkstra) 
• … intelligence?
Chase 
But to have all this you need: 
• An easy way for your informers to send 
time series events 
Hint: REST interface 
With OrientDB you can expose Javascript 
functions as REST services!
Chase 
And you need: 
• An extended query language 
Eg. 
TRAVERSE out(“street”) FROM ( 
SELECT out(“point”) FROM #11:11 
// my last event 
) WHILE canBeReached($current, #11:11) 
(where he could be)
Chase 
With OrientDB you can write 
function canBeReached(node, event) 
In Javascript and use it in your queries
Chase 
It’s just a game, but think about: 
• Fraud detection 
• Traffic routing 
• Multi-dimensional analytics 
• Forecasting 
• …
Summary
One model is not enough 
One of most common issues of my customers 
is: 
“I have a zoo of technologies in my application 
stack, and it’s getting worse every day” 
My answer is: Multi-Model DB
One model is not enough 
One of most common issues of my customers 
is: 
“I have a zoo of technologies in my application 
stack, and it’s getting worse every day” 
My answer is: Multi-Model DB 
of course ;-)
From: 
“choose the right data model for your 
use case” 
To: 
“Your application has multiple data 
models, you need all of them!”
This is NoSQL 2.0!!!
Thank you! 
@ldellaquila 
l.dellaquila@orientechnologies.com

More Related Content

What's hot (20)

PDF
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Chris Hoyean Song
 
PDF
Apache Airflow 概要(Airflowの基礎を学ぶハンズオンワークショップ 発表資料)
NTT DATA Technology & Innovation
 
PDF
Technical Deck Delta Live Tables.pdf
Ilham31574
 
PDF
【de:code 2020】 PostgreSQL もスケールさせよう! - Hyperscale (Citus) -
日本マイクロソフト株式会社
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PDF
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
PDF
Neo4j in Oil & Gas: Industry Use Cases and Impac
Neo4j
 
PDF
Top 5 mistakes when writing Spark applications
hadooparchbook
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Presto ベースのマネージドサービス Amazon Athena
Amazon Web Services Japan
 
PDF
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
Masahiko Sawada
 
PDF
Apache Airflow
Knoldus Inc.
 
PDF
Awsでつくるapache kafkaといろんな悩み
Keigo Suda
 
PDF
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
PDF
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Patrick Van Renterghem
 
PDF
GPUとSSDがPostgreSQLを加速する~クエリ処理スループット10GB/sへの挑戦~ [DB Tech Showcase Tokyo/2017]
Kohei KaiGai
 
PDF
バージョン17からのpg_stat_bgwriter (第48回 PostgreSQLアンカンファレンス 発表資料)
NTT DATA Technology & Innovation
 
PDF
Goroutineとchannelから始めるgo言語@初心者向けgolang勉強会
Takuya Ueda
 
PDF
MongoDB World 2019: MongoDB Read Isolation: Making Your Reads Clean, Committe...
MongoDB
 
PDF
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
Hajime Sano
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Chris Hoyean Song
 
Apache Airflow 概要(Airflowの基礎を学ぶハンズオンワークショップ 発表資料)
NTT DATA Technology & Innovation
 
Technical Deck Delta Live Tables.pdf
Ilham31574
 
【de:code 2020】 PostgreSQL もスケールさせよう! - Hyperscale (Citus) -
日本マイクロソフト株式会社
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
Neo4j in Oil & Gas: Industry Use Cases and Impac
Neo4j
 
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Presto ベースのマネージドサービス Amazon Athena
Amazon Web Services Japan
 
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
Masahiko Sawada
 
Apache Airflow
Knoldus Inc.
 
Awsでつくるapache kafkaといろんな悩み
Keigo Suda
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Patrick Van Renterghem
 
GPUとSSDがPostgreSQLを加速する~クエリ処理スループット10GB/sへの挑戦~ [DB Tech Showcase Tokyo/2017]
Kohei KaiGai
 
バージョン17からのpg_stat_bgwriter (第48回 PostgreSQLアンカンファレンス 発表資料)
NTT DATA Technology & Innovation
 
Goroutineとchannelから始めるgo言語@初心者向けgolang勉強会
Takuya Ueda
 
MongoDB World 2019: MongoDB Read Isolation: Making Your Reads Clean, Committe...
MongoDB
 
リアルタイムアクセスログ分析基盤をAWSに構築した話 (JAWS UG BigData Branch)
Hajime Sano
 

Viewers also liked (9)

PDF
Time Series With OrientDB - Fosdem 2015
wolf4ood
 
PDF
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Luigi Dell'Aquila
 
PDF
General trends in skill supply and demand on the labour market
European Economic and Social Committee - SOC Section
 
PDF
Geospatial Graphs made easy with OrientDB - Codemotion Spain
Luigi Dell'Aquila
 
PDF
OrientDB Distributed Architecture v2.0
Orient Technologies
 
PPT
Determinants of Demand and Supply in Tourism
Chinmoy Saikia
 
PDF
Demand analysis
metnashikiom2011-13
 
PPT
Demand Analysis
Sahil Mahajan
 
KEY
Event Driven Architecture
Stefan Norberg
 
Time Series With OrientDB - Fosdem 2015
wolf4ood
 
Geospatial Graphs made easy with OrientDB - Codemotion Milan 2016
Luigi Dell'Aquila
 
General trends in skill supply and demand on the labour market
European Economic and Social Committee - SOC Section
 
Geospatial Graphs made easy with OrientDB - Codemotion Spain
Luigi Dell'Aquila
 
OrientDB Distributed Architecture v2.0
Orient Technologies
 
Determinants of Demand and Supply in Tourism
Chinmoy Saikia
 
Demand analysis
metnashikiom2011-13
 
Demand Analysis
Sahil Mahajan
 
Event Driven Architecture
Stefan Norberg
 
Ad

Similar to OrientDB - Time Series and Event Sequences - Codemotion Milan 2014 (20)

PPTX
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...
Codemotion
 
PDF
Intel realtime analytics_spark
Geetanjali G
 
PDF
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
PDF
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
PPTX
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
PDF
So you think you can stream.pptx
Prakash Chockalingam
 
PDF
cb streams - gavin pickin
Ortus Solutions, Corp
 
PPTX
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
PDF
Lifting the hood on spark streaming - StampedeCon 2015
StampedeCon
 
PPTX
PEARC17: Visual exploration and analysis of time series earthquake data
Amit Chourasia
 
PPTX
MongoDB for Time Series Data
MongoDB
 
PDF
MapReduce basics
Harisankar H
 
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
PPTX
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
PPTX
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...
Codemotion
 
Intel realtime analytics_spark
Geetanjali G
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
So you think you can stream.pptx
Prakash Chockalingam
 
cb streams - gavin pickin
Ortus Solutions, Corp
 
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
Lifting the hood on spark streaming - StampedeCon 2015
StampedeCon
 
PEARC17: Visual exploration and analysis of time series earthquake data
Amit Chourasia
 
MongoDB for Time Series Data
MongoDB
 
MapReduce basics
Harisankar H
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Ad

More from Luigi Dell'Aquila (13)

PDF
GeeCON Prague 2016 - Geospatial Graphs made easy with OrientDB
Luigi Dell'Aquila
 
PDF
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
Luigi Dell'Aquila
 
PDF
Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016
Luigi Dell'Aquila
 
PDF
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016
Luigi Dell'Aquila
 
PDF
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
PDF
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
PDF
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
Luigi Dell'Aquila
 
PPTX
OrientDB - the 2nd generation of (Multi-Model) NoSQL
Luigi Dell'Aquila
 
PPT
​Fully Reactive - from Data to UI with OrientDB + Node.js + Socket.io
Luigi Dell'Aquila
 
PPTX
OrientDB meetup roma 2014
Luigi Dell'Aquila
 
PPTX
OrientDB Codemotion 2014
Luigi Dell'Aquila
 
PPTX
OrientDB - cloud barcamp Libero Cloud
Luigi Dell'Aquila
 
PPTX
Orient DB on the cloud - Cloud Party 2013
Luigi Dell'Aquila
 
GeeCON Prague 2016 - Geospatial Graphs made easy with OrientDB
Luigi Dell'Aquila
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
Luigi Dell'Aquila
 
Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016
Luigi Dell'Aquila
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016
Luigi Dell'Aquila
 
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
Luigi Dell'Aquila
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
Luigi Dell'Aquila
 
​Fully Reactive - from Data to UI with OrientDB + Node.js + Socket.io
Luigi Dell'Aquila
 
OrientDB meetup roma 2014
Luigi Dell'Aquila
 
OrientDB Codemotion 2014
Luigi Dell'Aquila
 
OrientDB - cloud barcamp Libero Cloud
Luigi Dell'Aquila
 
Orient DB on the cloud - Cloud Party 2013
Luigi Dell'Aquila
 

Recently uploaded (20)

PPTX
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Probability systematic sampling methods.pptx
PrakashRajput19
 

OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

  • 1. Time flows, my friend Managing event sequences and time series with a Document-Graph Database Codemotion Milan 2014 Luigi Dell’Aquila Orient Technologies LTD Twitter: @ldellaquila
  • 3. Time What…? Time series: A time series is a sequence of data points, typically consisting of successive measurements made over a time interval (Wikipedia)
  • 4. Time What…? Event sequences: • A set of events with a timestamp • A set of relationships “happened before/after” • Cause and effect relationships
  • 5. Time What…? Time as a dimension: • Direct: – Eg. begin and end of relationships (I’m a friend of John since…) • Calculated – Eg. Speed (distance/time)
  • 6. Time What…? Time as a constraint: • Query execution time!
  • 7. The problem: Fast and Effective
  • 8. Fast and Effective Fast write: Time doesn’t wait! Writes just arrive Fast read: a lot of data to be read in a short time Effective manipulation: complex operations like - Aggregation - Prediction - Analysis
  • 10. Current approaches 0. Relational approach: table Timestamp Value 2014:11:21 14:35:00 1321 2014:11:21 14:35:01 2444 2014:11:21 14:35:02 2135 2014:11:21 14:35:03 1833
  • 11. Current approaches 0. Relational approach: table HH MM SS Value 14 35 0 1321 14 35 1 2444 14 35 2 2135 14 35 3 1833
  • 12. Current approaches 0. Relational – Advantages • Simple • It can be used together with your application data (operational)
  • 13. Current approaches 0. Relational – Disadvantages • Slow read (relies on an index) • Slow insert (update the index…)
  • 14. Current approaches 1. Document Database • Collections of Documents instead of tables • Schemaless • Complex data structures
  • 15. Current approaches 1. Document approach: Minute Based { timestamp: “2014-11-21 12.05“ load: [10, 15, 3, … 30] //array of 60, one per second }
  • 16. Current approaches 1. Document approach: Hour Based { timestamp: “2014-11-21 12.00“ load: { 0: [10, 15, 3, … 30], //array of 60, one per second 1: [0, 12, 31, … 24], … 59: [10, 10, 1, … 16] } }
  • 17. Current approaches 1. Document approach – Advantages • Fast write: One insert x 60 updates • Fast fetch
  • 18. Current approaches 1. Document approach – Disadvantages • Fixed time windows • Single point per unit • How to pre-aggregate? • Relationships with the rest of the world? • Relationships between events?
  • 19. Current approaches 2. Graph Database • Nodes/Edges instead of tables • Index free adjacency • Fast traversal • Dynamic structure
  • 20. Current approaches 2. Graph approach: linked sequence e 1 next e e 2 next e next e 3 4 5 next (timestamp on vertex)
  • 21. Current approaches 2. Graph approach: linked sequence (tag based) e 1 e 2 nextTag1 e 3 nextTag2 e 4 nextTag1 e 5 nextTag1 nextTag2 [Tag1, Tag2] [Tag1] [Tag1, Tag2] [Tag1] [Tag2]
  • 22. Current approaches 2. Graph approach: Hierarchy e 1 e 2 e6 0 1 1 8 24 2 60 … … Days Hours Minutes Seconds … e 3
  • 23. Current approaches 2. Graph approach: mixed e 1 e 2 e6 0 1 1 8 24 2 60 … … Days Hours Minutes Seconds … e 3
  • 24. Current approaches 1. Graph approach – Advantages • Flexible • Events can be connected together in different ways • You can connect events to other entities • Fast traversal of dynamic time windows • Fast aggregation (based on hierarchy)
  • 25. Current approaches 1. Graph approach – Disadvantages • Slow writes (vertex + edge + maintenance) • Not so fast reads
  • 26. Can we mix different models and get all the advantages?
  • 27. Can we mix all this with the rest of application logic?
  • 29. • Document database (schema-free, complex properties) • Graph database (index-free adjacency, fast traversal) • SQL (extended) • Operational (schema - ACID) • OO concepts (Classes, inheritance, polymorphism) • REST/JSON interface • Native Javascript (extend query language, expose services, event hooks) • Distributed (Multi-master replica/sharding) architecture
  • 30. OrientDB First step: put them together 1 1 8 24 2 60 … Days Hours Minutes … { 0: 1000, 1: 1500. … 59: 96 }
  • 31. OrientDB First step: put them together 1 1 8 24 2 60 … Days Hours Minutes … { 0: 1000, 1: 1500. … 59: 96 } Graph Document <- IT’S A VERTEX TOO!!!
  • 32. OrientDB First step: put them together 1 8 24 Days … Hours { 0: { 0: 1000, 1: 1500, … 59: 210 } 1: { … } … 59: { … } } Graph Document
  • 33. Where should I stop? It depends on my domain and requirements.
  • 34. OrientDB Result: • Same insert speed of Document approach • But with flexibility of a Graph • (as a side effect of mixing models, documents can also contain “pointers” to other elements of app domain)
  • 35. OrientDB Second step: Pre-aggregate 1 1 8 24 2 60 … Days Hours Minutes … { 0: 1000, 1: 1500. … 59: 96 } Graph Document <- IT’S A VERTEX TOO!!!
  • 36. OrientDB Second step: Pre-aggregate 1 1 8 24 2 60 … Days Hours Minutes … { 0: 1000, 1: 1500. … 59: 96 } Graph sum() Document <- IT’S A VERTEX TOO!!!
  • 37. OrientDB Second step: Pre-aggregate 1 sum() 1 8 24 2 60 … Days Hours Minutes … { 0: 1000, 1: 1500. … 59: 96 } Graph sum() Document <- IT’S A VERTEX TOO!!!
  • 38. OrientDB How to aggregate Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insert or update) Java interface: Public RESULT onBeforeInsert(…); public void onAfterInsert(…); public RESULT onBeforeUpdate(…); public void onAfterUpdate(…);
  • 39. OrientDB Aggregation logic • Second 0 -> insert • Second 1 -> update • … • Second 57 -> update • Second 58 -> update • Second 59 -> update + aggregate – Write aggregate value on minute vertex • Minute == 59? Calculate aggregate on hour vertex
  • 40. OrientDB 1 1 8 24 2 60 … Days Hours Minutes … { 0: 1, 1: 12. … 59: 3 } sum = 1000 sum = 15000 sum = 300 1 2 incomplete complete sum = null sum = null
  • 41. OrientDB Query logic: • Traverse from root node to specified level (filtering based on vertex data) • Is there aggregate value? – Yes: return it – No: go one level down and do the same Aggregation on a level will be VERY fast if you have horizontal edges!
  • 42. OrientDB How to calculate aggregate values with a query Input params: - Root node (suppose it is #11:11) select sum(aggregateVal) from ( traverse out() from #11:11 while in().aggregateVal is null ) With the same logic you can query based on time windows
  • 43. OrientDB Third step: Complex domains 1 1 2 60 … Hours Minutes { 0: {val: 1000}, 1: {val: 1500}. … 59: { val: 96, eventTags: [tag1, tag2] … } } Graph Document <- Enrich the domain
  • 44. OrientDB Another use case: Event Categories and OO e 1 e 2 nextTag1 e 3 nextTag2 e 4 nextTag1 e 5 nextTag1 nextTag2 [Tag1, Tag2, Tag3] [Tag1] [Tag1, Tag2] [Tag1] [Tag2] nextTag3 e 3 [Tag3]
  • 45. OrientDB Another use case: Event Categories and OO Suppose tags are hierarchical categories (Classes for vertices and/or edges) nextTAG nextTagX nextTag3 nextTag1 nextTag2
  • 46. OrientDB Subset of events TRAVERSE out(‘nextTag1’) FROM <e1> e 1 e 2 nextTag1 e 4 nextTag1 e 5 nextTag1 [Tag1, Tag2, Tag3] [Tag1] [Tag1, Tag2] [Tag1]
  • 47. OrientDB Subset of events TRAVERSE out(‘nextTag2’) FROM <e1> e 1 nextTag1 nextTag2 e e 3 5 nextTag2 [Tag1, Tag2, Tag3] [Tag1, Tag2] [Tag2]
  • 48. OrientDB Subset of events (Polymorphic!!!) TRAVERSE out(‘nextTagX’) FROM <e1> e 1 e 2 nextTag1 e 3 nextTag2 e 4 nextTag1 e 5 nextTag1 nextTag2 [Tag1, Tag2, Tag3] [Tag1] [Tag1, Tag2] [Tag1] [Tag2]
  • 49. Connect all this with the rest of your application domain
  • 50. You’ll see, everything will get more complex: you will discover new time-related dimensions (speed, position…) and new needs (complex forecasting)
  • 52. Chase • Your target is running away • You have informers that track his moves (coordinates in a point of time) and give you additional (unstructured) information • You have a street map • You want to: – Catch him ASAP – Predict his moves – Be sure that he is inside an area
  • 53. Chase
  • 54. Chase
  • 55. Chase • Map is made of points and distances • You also have speed limits for streets point1 pointN Distance: 1Km Max speed: 70Km/h Distance: 2Km Max speed: 120Km/h Distance: 8Km Max speed: 90Km/h Map point Street
  • 56. Chase • Map is made of points and distances • You also have speed limits for streets • Distance / Speed = TIME!!!
  • 57. Chase You have a time series of your target’s moves { { Timestamp: 29/11/2014 17:15:00 LAT: 19,12223 LON: 42,134 } Timestamp: 29/11/2014 17:55:00 LAT: 19,12223 LON: 42,134 } Event Event seqence { Timestamp: 29/11/2014 17:55:00 LAT: 19,12223 LON: 42,134 }
  • 58. Chase You have a time series of your target’s moves 21/11/2014 2:35:00 PM 20/11/2014 1:20:00 PM Map point Street
  • 59. Chase You have a time series of your target’s moves 21/11/2014 14:35:00 20/11/2014 13:20:00 Event Map point Where Event seqence Street 29/11/2014 17:55:00
  • 60. Chase Vertices and edges are also documents So you can store complex information inside them { timestamp: 22213989487987, lat: xxxx, lon: yyy, informer: 15, additional: { speed: 120, description: “the target was in a car” car: { model: “Fiat 500”, licensePlate: “AA 123 BB” } } }
  • 61. Chase Now you can: • Predict his moves (eg. statistical methods, interpolation on lat/lon + time) • Calculate how far he can be (based on last position, avg speed and street data) • Reach him quickly (shortest path, Dijkstra) • … intelligence?
  • 62. Chase But to have all this you need: • An easy way for your informers to send time series events Hint: REST interface With OrientDB you can expose Javascript functions as REST services!
  • 63. Chase And you need: • An extended query language Eg. TRAVERSE out(“street”) FROM ( SELECT out(“point”) FROM #11:11 // my last event ) WHILE canBeReached($current, #11:11) (where he could be)
  • 64. Chase With OrientDB you can write function canBeReached(node, event) In Javascript and use it in your queries
  • 65. Chase It’s just a game, but think about: • Fraud detection • Traffic routing • Multi-dimensional analytics • Forecasting • …
  • 67. One model is not enough One of most common issues of my customers is: “I have a zoo of technologies in my application stack, and it’s getting worse every day” My answer is: Multi-Model DB
  • 68. One model is not enough One of most common issues of my customers is: “I have a zoo of technologies in my application stack, and it’s getting worse every day” My answer is: Multi-Model DB of course ;-)
  • 69. From: “choose the right data model for your use case” To: “Your application has multiple data models, you need all of them!”
  • 70. This is NoSQL 2.0!!!