SlideShare a Scribd company logo
Optiq
A dynamic data management framework
@julianhyde
Friday, October 4, 13
3 things
1. Databases are good.
2. It is hard to build analytics if your data is
“all over the place.”
3. Optiq makes heterogeneous data look and
behave more like a database.
Friday, October 4, 13
Mondrian on Optiq on
hybrid data + memory
Friday, October 4, 13
Big Data
Friday, October 4, 13
Friday, October 4, 13
“Data all over the
place”
• Different locations (HDFS, memory, DBMS)
• Different formats
• Different workloads
• Transactional: Mainly write, targeted read
• Analytic: Mainly read (bulk write)
• Query latency: Interactive vs. batch
• Data latency: Can we show out-of-date data?
Friday, October 4, 13
Databases are good
• Central point to manage data
• Simple, standard API for apps
• Powerful modeling techniques (e.g. star
schemas)
• Data independence (i.e. tune your data
after you write your application)
• Query optimization
Friday, October 4, 13
Optiq
Friday, October 4, 13
Conventional DB architecture
Friday, October 4, 13
Optiq architecture
Friday, October 4, 13
Examples
Friday, October 4, 13
Example #1: CSV
• Uses CSV adapter (optiq-csv)
• Demo using sqlline
• Easy to run this for yourself:
$ git clone https://github.com/julianhyde/optiq-csv
$ cd optiq-csv
$ mvn install
$ ./sqlline
Friday, October 4, 13
More adapters
Adapters Embedded Planned
CSV Cascading (Lingual) HBase (Phoenix)
JDBC Apache Drill Spark
MongoDB Cassandra
Splunk Mondrian
linq4j
Friday, October 4, 13
Example #2:
Splunk + MySQL
SELECT p.product_name,
COUNT(*) AS c
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.product_id = p.product_id
WHERE s.action = 'purchase'
GROUP BY p.product_name
ORDER BY c DESC
Friday, October 4, 13
Expression tree
Friday, October 4, 13
Optimized tree
Friday, October 4, 13
Analytics on
heterogeneous data
Friday, October 4, 13
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Friday, October 4, 13
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Friday, October 4, 13
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
• Not possible?! It takes 80s just to read the
data.
Friday, October 4, 13
Solution: Cheat!
Friday, October 4, 13
Solution: Cheat!
• Compress data
• Column-oriented storage
• Store data in sorted order
• Put data in memory
• Cache previous query results
• Pre-compute (materialize) aggregates
Friday, October 4, 13
How Optiq helps you
to cheat
Friday, October 4, 13
How Optiq helps you
to cheat
• Materialized views
• Pre-defined aggregate tables
• Cached query results = In-memory tables
• Smart cache maintenance
• Quickly bring materializations online & offline
• Materializations over a subset of the data
• Spark distributed, in-memory processing & cache
• Application thinks it is talking to a single SQL database
Friday, October 4, 13
Mondrian on Optiq on
hybrid data + memory
Friday, October 4, 13
Summary
Friday, October 4, 13
3 things (reprise)
1. Databases are good. Especially the
flexibility that SQL gives us.
2. It is hard to build analytics if your
data is all over the place. Different
workloads (operational vs. analytic, small write
vs. bulk read) require different data structures.
3. Optiq is not a database. But Optiq creates
a federated data architecture that performs
well, and looks like a database to your tools.
Friday, October 4, 13
@julianhyde
optiq https://github.com/julianhyde/optiq
mondrian http://mondrian.pentaho.com
blog http://julianhyde.blogspot.com
Friday, October 4, 13

More Related Content

What's hot (20)

PDF
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
PDF
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
PPTX
Azure DocumentDB 101
Ike Ellis
 
PPTX
NoSQL for SQL Users
IBM Cloud Data Services
 
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
PDF
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
PPTX
Hadoop for the Absolute Beginner
Ike Ellis
 
PPTX
Introduction to Google BigQuery
Csaba Toth
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PPTX
Google BigQuery 101 & What’s New
DoiT International
 
PDF
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Wojciech Biela
 
PDF
Real-World NoSQL Schema Design
DataWorks Summit/Hadoop Summit
 
PDF
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
 
PPTX
Practical Use of a NoSQL Database
IBM Cloud Data Services
 
PDF
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
PPT
SQL on Big Data using Optiq
Julian Hyde
 
PPTX
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
PPTX
Database Choices
Lynn Langit
 
PDF
Dremio introduction
Alexis Gendronneau
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Azure DocumentDB 101
Ike Ellis
 
NoSQL for SQL Users
IBM Cloud Data Services
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Hadoop for the Absolute Beginner
Ike Ellis
 
Introduction to Google BigQuery
Csaba Toth
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Google BigQuery 101 & What’s New
DoiT International
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Wojciech Biela
 
Real-World NoSQL Schema Design
DataWorks Summit/Hadoop Summit
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
 
Practical Use of a NoSQL Database
IBM Cloud Data Services
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
SQL on Big Data using Optiq
Julian Hyde
 
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Database Choices
Lynn Langit
 
Dremio introduction
Alexis Gendronneau
 

Viewers also liked (20)

PDF
Apache Calcite: One planner fits all
Julian Hyde
 
PDF
Streaming SQL
Julian Hyde
 
PDF
Streaming SQL with Apache Calcite
Julian Hyde
 
PPTX
Apache Calcite overview
Julian Hyde
 
PDF
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
PDF
7 tecnicas para construir un equipo brillante
David Bonilla
 
ODP
Patrones de toma de requisitos en proyectos ágiles en la Cas2013
Roberto Canales
 
PDF
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
MongoDB
 
PDF
Streaming SQL
Julian Hyde
 
PDF
The twins that everyone loved too much
Julian Hyde
 
PDF
What's new in Mondrian 4?
Julian Hyde
 
PDF
Why you care about
 relational algebra (even though you didn’t know it)
Julian Hyde
 
PDF
Cost-based query optimization in Apache Hive 0.14
Julian Hyde
 
PDF
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
PDF
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Julian Hyde
 
PPTX
Cost-based query optimization in Apache Hive 0.14
Julian Hyde
 
PDF
SQL on everything, in memory
Julian Hyde
 
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
 
PPTX
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho
 
PDF
Streaming SQL
Julian Hyde
 
Apache Calcite: One planner fits all
Julian Hyde
 
Streaming SQL
Julian Hyde
 
Streaming SQL with Apache Calcite
Julian Hyde
 
Apache Calcite overview
Julian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
7 tecnicas para construir un equipo brillante
David Bonilla
 
Patrones de toma de requisitos en proyectos ágiles en la Cas2013
Roberto Canales
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
MongoDB
 
Streaming SQL
Julian Hyde
 
The twins that everyone loved too much
Julian Hyde
 
What's new in Mondrian 4?
Julian Hyde
 
Why you care about
 relational algebra (even though you didn’t know it)
Julian Hyde
 
Cost-based query optimization in Apache Hive 0.14
Julian Hyde
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Julian Hyde
 
Cost-based query optimization in Apache Hive 0.14
Julian Hyde
 
SQL on everything, in memory
Julian Hyde
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho
 
Streaming SQL
Julian Hyde
 
Ad

Similar to Optiq: A dynamic data management framework (20)

PDF
SQL Now! How Optiq brings the best of SQL to NoSQL data.
Julian Hyde
 
PPTX
Sailing on the ocean of 1s and 0s
Woodruff Solutions LLC
 
PPTX
Big data
Yazan Abu Al Failat
 
PDF
Keeping Data Fresh: Mastering Updates in Vector Databases
Zilliz
 
PDF
Unit 1
kanchan khedikar
 
PDF
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
PDF
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
PPT
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph
 
PDF
Data Modeling for Performance Masterclass: Why Data Modeling Matters
ScyllaDB
 
PPTX
Big Data przt.pptx
MastewalAyeleAG
 
PPTX
Big Data & Data Science
BrijeshGoyani
 
PPTX
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
PDF
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Abzetdin Adamov
 
PDF
Operational-Analytics
Niloy Mukherjee
 
PPSX
Big data with Hadoop - Introduction
Tomy Rhymond
 
PDF
The Data Science Process: From Mining Raw Data to Story Visualization
Demetris Trihinas
 
PPTX
Ch1_Introduction to DATA SCIENCE_TYBSC(CS)_2024.pptx
sangeeta borde
 
PPTX
bigdata- Introduction for pg students fo
DharaniMani4
 
PPTX
BigData
Shankar R
 
PDF
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
Embarcadero Technologies
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
Julian Hyde
 
Sailing on the ocean of 1s and 0s
Woodruff Solutions LLC
 
Keeping Data Fresh: Mastering Updates in Vector Databases
Zilliz
 
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph
 
Data Modeling for Performance Masterclass: Why Data Modeling Matters
ScyllaDB
 
Big Data przt.pptx
MastewalAyeleAG
 
Big Data & Data Science
BrijeshGoyani
 
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Abzetdin Adamov
 
Operational-Analytics
Niloy Mukherjee
 
Big data with Hadoop - Introduction
Tomy Rhymond
 
The Data Science Process: From Mining Raw Data to Story Visualization
Demetris Trihinas
 
Ch1_Introduction to DATA SCIENCE_TYBSC(CS)_2024.pptx
sangeeta borde
 
bigdata- Introduction for pg students fo
DharaniMani4
 
BigData
Shankar R
 
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
Embarcadero Technologies
 
Ad

More from Julian Hyde (20)

PPTX
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
PDF
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Julian Hyde
 
PDF
Building a semantic/metrics layer using Calcite
Julian Hyde
 
PDF
Cubing and Metrics in SQL, oh my!
Julian Hyde
 
PDF
Adding measures to Calcite SQL
Julian Hyde
 
PDF
Morel, a data-parallel programming language
Julian Hyde
 
PDF
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
PDF
Morel, a Functional Query Language
Julian Hyde
 
PDF
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
PDF
The evolution of Apache Calcite and its Community
Julian Hyde
 
PDF
What to expect when you're Incubating
Julian Hyde
 
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
PDF
Efficient spatial queries on vanilla databases
Julian Hyde
 
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
PDF
Tactical data engineering
Julian Hyde
 
PDF
Don't optimize my queries, organize my data!
Julian Hyde
 
PDF
Spatial query on vanilla databases
Julian Hyde
 
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
PDF
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
PPTX
Lazy beats Smart and Fast
Julian Hyde
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Julian Hyde
 
Building a semantic/metrics layer using Calcite
Julian Hyde
 
Cubing and Metrics in SQL, oh my!
Julian Hyde
 
Adding measures to Calcite SQL
Julian Hyde
 
Morel, a data-parallel programming language
Julian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
Morel, a Functional Query Language
Julian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
The evolution of Apache Calcite and its Community
Julian Hyde
 
What to expect when you're Incubating
Julian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
Efficient spatial queries on vanilla databases
Julian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
Tactical data engineering
Julian Hyde
 
Don't optimize my queries, organize my data!
Julian Hyde
 
Spatial query on vanilla databases
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
Lazy beats Smart and Fast
Julian Hyde
 

Recently uploaded (20)

PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 

Optiq: A dynamic data management framework