SlideShare a Scribd company logo
50 Shades of
Data
how, when and why
Big, Fast, Relational,
NoSQL, Elastic,
Event, CQRS
On the many types of
data, data stores and data
usages
50 Shades of Data 1
µ
µ
Lucas Jellema, CTO of AMIS
Oracle Groundbreakers APAC Tour
Lucas Jellema
Architect / Developer
1994 started in IT at Oracle
2002 joined AMIS
Currently CTO & Solution Architect
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 2
こんばんは
Overview
• Multiple types of data
• Stored and processed in different ways
• Same data sometimes used in multiple, different ways
• Stored and processed multiple times – optimized for each use case
• The meaning of some terms cannot be taken too literally
• Real Time and Fresh
• Integrity and Truth
• Consistency and transactions
• Understand your data
• Meta: What does it mean?
• Master: Where is the source?
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 3
Tweet!
#codeone
Select from <stream of tweet events>
select text
, author
, timestamp
from tweets
Where tag = 'codeone'
<--- streaming data
Select Running Count
from <stream of tweet events>
select tag
, count(*) tweet_count
from tweets
group
by tag
Tweets on
#CodeOne #java
#oraclecode
Tweets
Topic
Oracle Cloud
Event HubApplication
Container
TWEET_COUNT
Topic
Running
Tweets
Aggregation
Client
Client
Client
Client
IoT metrics from
hundreds of devices
User actions & click
events from webshop
Live Traffic EventsMicroservices chatter
Social Media events
(Facebook,
Whatsapp, …)
IT Operations –
monitoring metrics
µ
µ
µ
µ
Tweets on #JEEConf
#java #oraclecode
Tweets
Topic
Oracle Cloud
Event HubApplication
Container
TWEET_COUNT
Topic
Running
Tweets
Aggregation
Client
Client
Client
Client
IoT metrics from
hundreds of devices
User actions & click
events from webshop
Live Traffic EventsMicroservices chatter
Social Media events
(Facebook,
Whatsapp, …)
IT Operations –
monitoring metrics
µ
µ
µ
µ
Real Time
live | fresh | instantaneous |
on line | synchronous
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS (Tokyo, Japan, November 13th, Oracle Groundbreakers JAPAC Tour)
50 Shades of Data 11
50 Shades of Data 12
50 Shades of Data 13
< 10
ms
< 100
ms
< 500
ms
<3
secs
> 3
secs
50 Shades of Data 14
Machine Response Human Reaction
14
< 10
ms
< 100
ms
< 500
ms
<3
secs
> 3
secs
50 Shades of Data 15
Machine Response Human Reaction
15
Integrity
• Madelon’s pasje
• Real world vs World of Databases
• Relax!
• Anomaly detection
50 Shades of Data 16
Data Constraints
to protect integrity
• Allowable values
• Mandatory attributes
• (Foreign Key) References
• NULL
• Constraints on
• type
• length
• format
• Spelling
• Character encoding
Data is representation of
the known real world
• How useful is it to enforce data integrity?
Data Integrity
• Why?
• Is it about truth?
• About regulations and by-the-book?
• Allow IT systems to run smoothly and not get confused?
• About auditability and non-repudiation?
• What about the real world?
• Data in IT is just a representation;
if the world is not by the book – what should IT do?
50 Shades of Data 20
Anomaly Detection
• Find fishy values and derive business integrity rules by scanning data
50 Shades of Data 21
BOL - CQRS
50 Shades of Data 22
Books Online - WebShop
50 Shades of Data 23
Products
Product updates
firewall
Data manipulation
Data Quality (enforcement)
<10K transactions
Batch jobs next to online
Speed is nice
Read only
On line
Speed is crucial
XHTML & JSON
> 5M visits
Webshop visits
- searches
- product details
- Orders
50 Shades of Data 24
Products
Products
Products
Webshop visits
- searches
- product details
- Orders
firewall
Data manipulation
Data Quality (enforcement)
<10K transactions
Batch jobs next to online
Speed is nice
Read only
On line
Speed is crucial
XHTML & JSON
> 1M visits
DMZ
Read only
JSON documents
Images
Text Search
Scale Horizontally
Stale but consistent
Products
Nightly generation
Product updates
Hoe integreer je applicaties en data? 25
Products
Data Manipulation
Data
Retrieval
Hoe integreer je applicaties en data? 26
Special
Products
Product
Clusters
ProductsData Manipulation
Data Retrieval
Food
Stuff
Toys
Quick Product
Search Index
Product Store in
SaaS app
Comand Query Responsbility Segregation = CQRS
50 Shades of Data 27
Special
Products
Product Clusters
ProductsData Manipulation
Data Retrieval
Food Stuff
Toys
Quick Product Search
Index
Product Store in
SaaS app
Detect changes
Extract Data
Transport Data
Convert Data
Apply Data
From C to Q
• How quickly?
• How frequently?
• How reliably?
• How atomically?
•
50 Shades of Data 28
Products
Quick Product Search
Index
50 Shades of Data 29
From C to Q
• How quickly?
• How frequently?
• How reliably?
• How atomic?
•
• Data Authorization Considerations
• Locations & Connectivity
• Full resynch | restore of Query Store
50 Shades of Data 30
Products
Quick Product Search
Index
CQRS is not new
50 Shades of Data 31
Event Sourcing Driving CQRS
50 Shades of Data 32
Events Event Store
Current State
accountId:
123
amount: 10
Owner: Jane Doe
Event Sourcing Driving CQRS
50 Shades of Data 33
Events Event Store
Current State
Other State Aggregate
Distributed Database with Event Sourcing & Current State
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable34
World State
SQL is not good at anything
• But it sucks at nothing
Session Recommendation Engine for CodeOne
• Recommend sessions to me
• That are Presented by Speakers
• Who are Liked by People
• Who Attended the same Sessions that I Attended
• Start from me and the sessions
I attended
• Locate other attendees in these
sessions
• Find the speakers they like
• Retrieve the sessions presented
by those speakers
36
The Relational Approach
37
PEOPLE SESSIONS
ATTENDANCE
SPEAKERS
SPEAKER_LIKING
SQL Query to find the Recommendations
38
The Graph DB Approach (Neo4J using Cypher)
• No tables are created
• As data is created, meta-data is derived
39
The Graph DB Approach
40
Performing the Recommendations Query
41
SQL vs NoSQL
42
Graph Database
• Natural fit during development
• Easier to write and maintain
• Superior (10-1000 times better)
performance Person liked
by anyone
liked by Bob
Find People
liked by
anyone liked
by Bob
Find People
liked by
anyone liked
by Bob
SQL vs NoSQL
SQL vs NoSQL
ACID vs BASE
Relational vs …
Relational Databases
• Based on relational model of data (E.F. Codd), a mathematical foundation
• Uses SQL for query, DML and DDL
• Transactions are ACID (Atomicity, Consistency, Isolation, Durability)
• All or nothing
• Constraint Compliant
• Individual experience
[in a multi-session environment]
(aka concurrency)
• Down does not hurt
ACID comes at a cost – performance & scalability
• Transaction results have to be persisted [before the transaction completes]
in order to guarantee D
• Concurrency requires some degree of locking (and multi-versioning) in order
to have I
• Constraint compliance (unique key, foreign key) means all data hangs
together (as do all transactions)
in order to have C
• Two-phase commit (across multiple participants)
introduces complexity, dependencies and delays,
yet required for A
50 Shades of Data 49
Types of NoSQL
50 Shades of Data 51
NoSQL n’est pas No SQL
50 Shades of Data 52
50 Shades of Data 53
When things were simple
RDBMS
SQL
ACID
Data
files
Log
Files
Backup
Backup
Backup
SAN
And then stuff happened
Middle Tier:
Java EE (Stateful) application
Client Tier:
Browser
Client Tier:
Browser
Client Tier:
Browser
Mobile App
(offline)
Mobile App
(offline)
Mobile App
(offline)
Data
Warehouse
OO,
XML,
JSON
Content
Management
Big Data
Fast Data
API
API
API
µ λ
50 Shades of Data 56
50 Shades of Data
Oracle Database
SQL
RDBMS
ACID
50 Shades of Data 62
50 Shades of Data 63
http
IoT Fast Data
Ingestion
Sharding
http
Machine Learning
No
SQL
Big Data
SQL
Multitenant
(Pluggable Database) Architecture
Flashback
50 Shades of Data 64
Oracle Database XE – eXpress Edition
• Current version: XE 11gR2
• Available since October 2018: XE 18c, with yearly releases (19c, 20c, …)
• All functionality of single instance Oracle Database Enterprise Edition
plus Extra Options
• (including R, Machine Learning, Spatial, Compression, Multi Tenant – for 3 PDBs, Partitioning)
• Code and Data Compatible with other editions – including plug/unplug
• Resource Limitations for 18c:
• 2 CPUs
• 2 GB of memory
• 12 GB of disk space (using Compression effectively 40 GB of data)
• No patches or support
Review of Oracle OpenWorld & CodeOne 2018 - #oowamis 65
Wrap Up
72
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS (Tokyo, Japan, November 13th, Oracle Groundbreakers JAPAC Tour)
usage
Total Cost of Data Ownership
authorization
distribution
formatvolatility volume
ACID demands
availability
freshness requirements
(staleness allowance)
location
speed
ownership
required consistency
integrity
query patterns
50 Shades of Data 75
Summary
• Multiple types of data
• Stored and processed in different ways
• Same data sometimes used in multiple, different ways
• Stored and processed multiple times – optimized for each use case
• The meaning of some terms cannot be taken too literally
• Real Time and Fresh
• Integrity and Truth
• Consistency and transactions
• Understand your data
• Meta: What does it mean?
• Master: Where is the source?
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 76
Wrap Up
DATA
DATADATA
Thank you!
ありがとうございました
• Blog: technology.amis.nl
• Email: lucas.jellema@amis.nl
• : @lucasjellema
• : lucas-jellema
• : www.amis.nl, info@amis.nl

More Related Content

PPTX
Business and IT agility through DevOps and microservice architecture powered ...
Lucas Jellema
 
PPTX
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Lucas Jellema
 
PPTX
A Cloud- and Container-Based Approach to Microservices-Powered Workflows (Cod...
Lucas Jellema
 
PPTX
The Art of Intelligence – Introduction Machine Learning for Java professional...
Lucas Jellema
 
PPTX
Integrating Applications and Data (with Oracle PaaS Cloud) - Oracle Cloud Day...
Lucas Jellema
 
PPTX
Changing Views on Integration (AUSOUG Webinar Series, May 2020)
Lucas Jellema
 
PPTX
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
Lucas Jellema
 
PPTX
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
Business and IT agility through DevOps and microservice architecture powered ...
Lucas Jellema
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Lucas Jellema
 
A Cloud- and Container-Based Approach to Microservices-Powered Workflows (Cod...
Lucas Jellema
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
Lucas Jellema
 
Integrating Applications and Data (with Oracle PaaS Cloud) - Oracle Cloud Day...
Lucas Jellema
 
Changing Views on Integration (AUSOUG Webinar Series, May 2020)
Lucas Jellema
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
Lucas Jellema
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 

What's hot (20)

PPTX
Microservices in the Enterprise
Jesus Rodriguez
 
PPTX
Azure architecture design patterns - proven solutions to common challenges
Ivo Andreev
 
PPTX
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and S...
Lucas Jellema
 
PDF
Oracle Data Integration CON9737 at OpenWorld
Jeffrey T. Pollock
 
PPTX
Blockchain for the DBA and Data Professional
Karen Lopez
 
PDF
Real time data ingestion and Hybrid Cloud
Neeraj Sabharwal
 
PPTX
Stream Analytics in the Enterprise
Jesus Rodriguez
 
PPTX
OAC - From Cloud Entry to Data Engineering to Data Science
Christian Berg
 
PPTX
NoSQL for the SQL Server Pro
Lynn Langit
 
PDF
Microservices Patterns with GoldenGate
Jeffrey T. Pollock
 
PDF
Complex Data Transformations Made Easy
Data Con LA
 
PPTX
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
PDF
Webinar: SQL for Machine Data?
Crate.io
 
PDF
Designing a Real Time Data Ingestion Pipeline
DataScience
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Northwestern Mutual Journey – Transform BI Space to Cloud
Databricks
 
PDF
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
PDF
A7 storytelling with_oracle_analytics_cloud
Dr. Wilfred Lin (Ph.D.)
 
PPTX
Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)
Getting value from IoT, Integration and Data Analytics
 
PPTX
Oracle JavaScript Extension Toolkit Web Components Bring Agility to App Devel...
Lucas Jellema
 
Microservices in the Enterprise
Jesus Rodriguez
 
Azure architecture design patterns - proven solutions to common challenges
Ivo Andreev
 
Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and S...
Lucas Jellema
 
Oracle Data Integration CON9737 at OpenWorld
Jeffrey T. Pollock
 
Blockchain for the DBA and Data Professional
Karen Lopez
 
Real time data ingestion and Hybrid Cloud
Neeraj Sabharwal
 
Stream Analytics in the Enterprise
Jesus Rodriguez
 
OAC - From Cloud Entry to Data Engineering to Data Science
Christian Berg
 
NoSQL for the SQL Server Pro
Lynn Langit
 
Microservices Patterns with GoldenGate
Jeffrey T. Pollock
 
Complex Data Transformations Made Easy
Data Con LA
 
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Webinar: SQL for Machine Data?
Crate.io
 
Designing a Real Time Data Ingestion Pipeline
DataScience
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Databricks
 
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
A7 storytelling with_oracle_analytics_cloud
Dr. Wilfred Lin (Ph.D.)
 
Oracle OpenWorld 2017 Review (31st October 2017 - 250 slides)
Getting value from IoT, Integration and Data Analytics
 
Oracle JavaScript Extension Toolkit Web Components Bring Agility to App Devel...
Lucas Jellema
 
Ad

Similar to 50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS (Tokyo, Japan, November 13th, Oracle Groundbreakers JAPAC Tour) (20)

PPTX
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
PPTX
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
Lucas Jellema
 
PPTX
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
PDF
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
PDF
Neo4j in Depth
Max De Marzi
 
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
PDF
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Thibaud Desodt
 
PPTX
50 Shades of Data - from 0 and 1 to a rich data spectrum - RMOUG 2021 Trainin...
Lucas Jellema
 
PDF
Hadoop & no sql new generation database systems
ramazan fırın
 
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
PDF
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 
PPTX
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
PPTX
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald
 
PDF
Spark and cassandra (Hulu Talk)
Jon Haddad
 
PPTX
SQL vs No SQL vs NewSQL for online transactional processing.pptx
mahdiaghaei19
 
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
PPTX
Revision
David Sherlock
 
PDF
Pitfalls of Data Warehousing_2019-04-24
Martin Bém
 
PPTX
Evolution of the DBA to Data Platform Administrator/Specialist
Tony Rogerson
 
PPTX
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
Lucas Jellema
 
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Neo4j in Depth
Max De Marzi
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate
 
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Thibaud Desodt
 
50 Shades of Data - from 0 and 1 to a rich data spectrum - RMOUG 2021 Trainin...
Lucas Jellema
 
Hadoop & no sql new generation database systems
ramazan fırın
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald
 
Spark and cassandra (Hulu Talk)
Jon Haddad
 
SQL vs No SQL vs NewSQL for online transactional processing.pptx
mahdiaghaei19
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Revision
David Sherlock
 
Pitfalls of Data Warehousing_2019-04-24
Martin Bém
 
Evolution of the DBA to Data Platform Administrator/Specialist
Tony Rogerson
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
Ad

More from Lucas Jellema (20)

PPTX
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
PPTX
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
PPTX
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
PPTX
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
PPTX
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
PPTX
Op je vingers tellen... tot 1000!
Lucas Jellema
 
PPTX
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
PPTX
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
PPTX
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
PPTX
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
PPTX
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
PPTX
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
PPTX
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
PPTX
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
PPTX
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
PPTX
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
PPTX
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
PPTX
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
Op je vingers tellen... tot 1000!
Lucas Jellema
 
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 

Recently uploaded (20)

PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Presentation about variables and constant.pptx
safalsingh810
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
oapresentation.pptx
mehatdhavalrajubhai
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 

50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS (Tokyo, Japan, November 13th, Oracle Groundbreakers JAPAC Tour)

  • 1. 50 Shades of Data how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS On the many types of data, data stores and data usages 50 Shades of Data 1 µ µ Lucas Jellema, CTO of AMIS Oracle Groundbreakers APAC Tour
  • 2. Lucas Jellema Architect / Developer 1994 started in IT at Oracle 2002 joined AMIS Currently CTO & Solution Architect Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 2 こんばんは
  • 3. Overview • Multiple types of data • Stored and processed in different ways • Same data sometimes used in multiple, different ways • Stored and processed multiple times – optimized for each use case • The meaning of some terms cannot be taken too literally • Real Time and Fresh • Integrity and Truth • Consistency and transactions • Understand your data • Meta: What does it mean? • Master: Where is the source? Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 3
  • 5. Select from <stream of tweet events> select text , author , timestamp from tweets Where tag = 'codeone' <--- streaming data
  • 6. Select Running Count from <stream of tweet events> select tag , count(*) tweet_count from tweets group by tag
  • 7. Tweets on #CodeOne #java #oraclecode Tweets Topic Oracle Cloud Event HubApplication Container TWEET_COUNT Topic Running Tweets Aggregation Client Client Client Client IoT metrics from hundreds of devices User actions & click events from webshop Live Traffic EventsMicroservices chatter Social Media events (Facebook, Whatsapp, …) IT Operations – monitoring metrics µ µ µ µ
  • 8. Tweets on #JEEConf #java #oraclecode Tweets Topic Oracle Cloud Event HubApplication Container TWEET_COUNT Topic Running Tweets Aggregation Client Client Client Client IoT metrics from hundreds of devices User actions & click events from webshop Live Traffic EventsMicroservices chatter Social Media events (Facebook, Whatsapp, …) IT Operations – monitoring metrics µ µ µ µ
  • 9. Real Time live | fresh | instantaneous | on line | synchronous
  • 11. 50 Shades of Data 11
  • 12. 50 Shades of Data 12
  • 13. 50 Shades of Data 13
  • 14. < 10 ms < 100 ms < 500 ms <3 secs > 3 secs 50 Shades of Data 14 Machine Response Human Reaction 14
  • 15. < 10 ms < 100 ms < 500 ms <3 secs > 3 secs 50 Shades of Data 15 Machine Response Human Reaction 15
  • 16. Integrity • Madelon’s pasje • Real world vs World of Databases • Relax! • Anomaly detection 50 Shades of Data 16
  • 17. Data Constraints to protect integrity • Allowable values • Mandatory attributes • (Foreign Key) References • NULL • Constraints on • type • length • format • Spelling • Character encoding
  • 18. Data is representation of the known real world • How useful is it to enforce data integrity?
  • 19. Data Integrity • Why? • Is it about truth? • About regulations and by-the-book? • Allow IT systems to run smoothly and not get confused? • About auditability and non-repudiation? • What about the real world? • Data in IT is just a representation; if the world is not by the book – what should IT do?
  • 20. 50 Shades of Data 20
  • 21. Anomaly Detection • Find fishy values and derive business integrity rules by scanning data 50 Shades of Data 21
  • 22. BOL - CQRS 50 Shades of Data 22
  • 23. Books Online - WebShop 50 Shades of Data 23 Products Product updates firewall Data manipulation Data Quality (enforcement) <10K transactions Batch jobs next to online Speed is nice Read only On line Speed is crucial XHTML & JSON > 5M visits Webshop visits - searches - product details - Orders
  • 24. 50 Shades of Data 24 Products Products Products Webshop visits - searches - product details - Orders firewall Data manipulation Data Quality (enforcement) <10K transactions Batch jobs next to online Speed is nice Read only On line Speed is crucial XHTML & JSON > 1M visits DMZ Read only JSON documents Images Text Search Scale Horizontally Stale but consistent Products Nightly generation Product updates
  • 25. Hoe integreer je applicaties en data? 25 Products Data Manipulation Data Retrieval
  • 26. Hoe integreer je applicaties en data? 26 Special Products Product Clusters ProductsData Manipulation Data Retrieval Food Stuff Toys Quick Product Search Index Product Store in SaaS app
  • 27. Comand Query Responsbility Segregation = CQRS 50 Shades of Data 27 Special Products Product Clusters ProductsData Manipulation Data Retrieval Food Stuff Toys Quick Product Search Index Product Store in SaaS app Detect changes Extract Data Transport Data Convert Data Apply Data
  • 28. From C to Q • How quickly? • How frequently? • How reliably? • How atomically? • 50 Shades of Data 28 Products Quick Product Search Index
  • 29. 50 Shades of Data 29
  • 30. From C to Q • How quickly? • How frequently? • How reliably? • How atomic? • • Data Authorization Considerations • Locations & Connectivity • Full resynch | restore of Query Store 50 Shades of Data 30 Products Quick Product Search Index
  • 31. CQRS is not new 50 Shades of Data 31
  • 32. Event Sourcing Driving CQRS 50 Shades of Data 32 Events Event Store Current State accountId: 123 amount: 10 Owner: Jane Doe
  • 33. Event Sourcing Driving CQRS 50 Shades of Data 33 Events Event Store Current State Other State Aggregate
  • 34. Distributed Database with Event Sourcing & Current State Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable34 World State
  • 35. SQL is not good at anything • But it sucks at nothing
  • 36. Session Recommendation Engine for CodeOne • Recommend sessions to me • That are Presented by Speakers • Who are Liked by People • Who Attended the same Sessions that I Attended • Start from me and the sessions I attended • Locate other attendees in these sessions • Find the speakers they like • Retrieve the sessions presented by those speakers 36
  • 37. The Relational Approach 37 PEOPLE SESSIONS ATTENDANCE SPEAKERS SPEAKER_LIKING
  • 38. SQL Query to find the Recommendations 38
  • 39. The Graph DB Approach (Neo4J using Cypher) • No tables are created • As data is created, meta-data is derived 39
  • 40. The Graph DB Approach 40
  • 43. Graph Database • Natural fit during development • Easier to write and maintain • Superior (10-1000 times better) performance Person liked by anyone liked by Bob Find People liked by anyone liked by Bob Find People liked by anyone liked by Bob
  • 45. SQL vs NoSQL ACID vs BASE Relational vs …
  • 46. Relational Databases • Based on relational model of data (E.F. Codd), a mathematical foundation • Uses SQL for query, DML and DDL • Transactions are ACID (Atomicity, Consistency, Isolation, Durability) • All or nothing • Constraint Compliant • Individual experience [in a multi-session environment] (aka concurrency) • Down does not hurt
  • 47. ACID comes at a cost – performance & scalability • Transaction results have to be persisted [before the transaction completes] in order to guarantee D • Concurrency requires some degree of locking (and multi-versioning) in order to have I • Constraint compliance (unique key, foreign key) means all data hangs together (as do all transactions) in order to have C • Two-phase commit (across multiple participants) introduces complexity, dependencies and delays, yet required for A
  • 48. 50 Shades of Data 49
  • 50. 50 Shades of Data 51
  • 51. NoSQL n’est pas No SQL 50 Shades of Data 52
  • 52. 50 Shades of Data 53
  • 53. When things were simple RDBMS SQL ACID Data files Log Files Backup Backup Backup SAN
  • 54. And then stuff happened Middle Tier: Java EE (Stateful) application Client Tier: Browser Client Tier: Browser Client Tier: Browser Mobile App (offline) Mobile App (offline) Mobile App (offline) Data Warehouse OO, XML, JSON Content Management Big Data Fast Data API API API µ λ
  • 55. 50 Shades of Data 56
  • 56. 50 Shades of Data Oracle Database SQL RDBMS ACID
  • 57. 50 Shades of Data 62
  • 58. 50 Shades of Data 63 http IoT Fast Data Ingestion Sharding http Machine Learning No SQL Big Data SQL Multitenant (Pluggable Database) Architecture Flashback
  • 59. 50 Shades of Data 64
  • 60. Oracle Database XE – eXpress Edition • Current version: XE 11gR2 • Available since October 2018: XE 18c, with yearly releases (19c, 20c, …) • All functionality of single instance Oracle Database Enterprise Edition plus Extra Options • (including R, Machine Learning, Spatial, Compression, Multi Tenant – for 3 PDBs, Partitioning) • Code and Data Compatible with other editions – including plug/unplug • Resource Limitations for 18c: • 2 CPUs • 2 GB of memory • 12 GB of disk space (using Compression effectively 40 GB of data) • No patches or support Review of Oracle OpenWorld & CodeOne 2018 - #oowamis 65
  • 62. 72
  • 64. usage Total Cost of Data Ownership authorization distribution formatvolatility volume ACID demands availability freshness requirements (staleness allowance) location speed ownership required consistency integrity query patterns
  • 65. 50 Shades of Data 75
  • 66. Summary • Multiple types of data • Stored and processed in different ways • Same data sometimes used in multiple, different ways • Stored and processed multiple times – optimized for each use case • The meaning of some terms cannot be taken too literally • Real Time and Fresh • Integrity and Truth • Consistency and transactions • Understand your data • Meta: What does it mean? • Master: Where is the source? Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 76
  • 68. Thank you! ありがとうございました • Blog: technology.amis.nl • Email: [email protected] • : @lucasjellema • : lucas-jellema • : www.amis.nl, [email protected]

Editor's Notes

  • #2: Fast data arrives in real time and potentially high volume. Rapid processing, filtering and aggregation is required to ensure timely reaction and actual information in user interfaces. Doing so is a challenge, make this happen in a scalable and reliable fashion is even more interesting. This session introduces Apache Kafka as the scalable event bus that takes care of the events as they flow in and Kafka Streams and KSQL for the streaming analytics. Both Java and Node applications are demonstrated that interact with Kafka and leverage Server Sent Events and WebSocket channels to update the Web UI in real time. User activity performed by the audience in the Web UI is processed by the Kafka powered back end and results in live updates on all clients. Fast data arrives in real time and potentially high volume. Rapid processing, filtering and aggregation is required to ensure timely reaction and actual information in user interfaces. Doing so is a challenge, make this happen in a scalable and reliable fashion is even more interesting. This session introduces Apache Kafka as the scalable event bus that takes care of the events as they flow in and Kafka Streams for the streaming analytics. Both Java and Node applications are demonstrated that interact with Kafka and leverage Server Sent Events and WebSocket channels to update the Web UI in real time. User activity performed by the audience in the Web UI is processed by the Kafka powered back end and results in live updates on all clients. Introducing the challenge: fast data, scalable and decoupled event handling, streaming analytics Introduction of Kafka demo of Producing to and consuming from Kafka in Java and Nodejs clients Intro Kafka Stream API for streaming analytics Demo streaming analytics from java client Intro of web ui: HTML 5, WebSocket channel and SSE listener Demo of Push from server to Web UI - in general End to end flow: - IFTTT picks up Tweets and pushed them to an API that hands them to Kafka Topic. - The Java application Consumes these events, performs Streaming Analytics (grouped by hashtag and author and time window) and counts them; the aggregation results are produced to Kafka - The NodeJS application consumes these aggregation results and pushes them to Web UI - The WebUI displays the selected Tweets along with the aggregation results - in the Web UI, users can LIKE and RATE the tweets; each like or rating is sent to the server and produced to Kafka; these events are processed too through Stream Analytics and result in updated Like counts and Average Rating results; these are then pushed to all clients; this means that the audience can Tweet, see the tweet appear in the web ui on their own device, rate & like and see the ratings and like count update in real time
  • #3: こんばんは Konbanwa
  • #18: https://specify.io/concepts/microservices
  • #19: https://specify.io/concepts/microservices
  • #20: https://specify.io/concepts/microservices
  • #21: 3d anomaly detection
  • #27: Data manipulation and retrieval in separate places (physical data proliferation) Query store is optimized for consumers Level of detail, format, filters applied For performance and scalability, independence, productivity lower license fees and lower TCO, security
  • #28: No Event Sourcing No events (?) No green field Packages Applications/SaaS Databases (RDBMS, NoSQL) getting changes from applications directly Challenges – at scale, with enough speed and consistently: do not let query store get into an exposed state that could not exist/be right! Detect relevant changes Extract relevant changes Transport Convert Apply in correct order and reliably (no lost events) Note: after detect and extract, an event can be published
  • #32: https://www.slideshare.net/LorenzoNicora/from-c-to-q-one-event-at-the-time-event-sourcing-illustrated
  • #33: Events are immutable facts Current state (active record) is derived from sum of events Read optimized aggregates are created for specific use case – based on events and rebuildable at any time
  • #34: Events are immutable facts Current state (active record) is derived from sum of events Read optimized aggregates are created for specific use case – based on events and rebuildable at any time
  • #35: Blockchain!
  • #36: https://specify.io/concepts/microservices
  • #44: https://specify.io/concepts/microservices
  • #45: https://specify.io/concepts/microservices
  • #48: https://specify.io/concepts/microservices
  • #49: https://specify.io/concepts/microservices
  • #50: WebScale ‘No ACID BASE Speed, reads Redundancy Read-optimized format Not all use cases require ACID (or can afford it) Read only (product catalog for web shops) Inserts only and no (inter-record) constraints Big Data collected and “dumped” in Data Lake (Hadoop) for subsequent processing High performance demands Not all data needs structured formats or structured querying and JOINs Entire documents are stored and retrieved based on a single key Sometimes – scalable availability and developer productivity is more important than Consistency – and ACID is sacrificed CAP-theorem states: Consistency [across nodes], Availability and Partition tolerance can not all three be satisfied
  • #51: https://specify.io/concepts/microservices
  • #69: All data stores are distributed Or at least distributedly available They can be local or on cloud (latency is important) Data in generic data store is still owned by only one microservice – no one can touch it Only in DWH and BigData do we deliberately take copies of data and disown them
  • #74: Data used to be like T-Ford One model, one color And then:
  • #75: Data comes in many shades (at least 50) – variations along many dimensions
  • #76: technologies
  • #80: Arigato-gozaimasta Arigatōgozaimashita