SlideShare a Scribd company logo
Analytics with
MariaDB ColumnStore
The Whys, Whats and
Hows
Agenda
• The Task - Analytics – Why and what
• The Requirements – What do we need for analytics
• The Solution – Column Based Storage
• The Product – MariaDB AX and MariaDB ColumnStore
• The Uses – MariaDB ColumnStore in action
Why Analytics and what do you get
A high level view on analytics
Why Analytics ?
• Get the most value of your data asset
• Faster Better decision making process
• Cost reduction
• New products and services
What is likely
to happen?
Why is it
happening?
Types of analytics
What is
happening?
What should I
do about it?
Descriptive: What happened ?
● Reports
○ Sales Report
○ Expense summary
● Ad-hoc requests to analyst
Diagnostics: Why did it happen
• Aggregates: aggregate measure over one or
more dimension
– Find total sales
– Top five product ranked by sales
• Roll-ups: Aggregate at different levels of
dimension hierarchy
– given total sales by city, roll-up to get sales by
state
• Drill-down: Inverse of roll-ups
– given total sales by state, drill-down to get
total by city
• Slicing and Dicing:
– Equality and range selections on one or more
dimensions
Predictive: What is likely to happen
• Sales Prediction
– Analyze data to identify trends, spot
weakness or determine conditions
among broader data sets for making
decisions about the future
• Targeted marketing
– what is likelihood of a customer buying
a particular product based on past
buying behavior
Real World Example - Visualization
Prescriptive: What is the best course of action?
Paradox of choices
With too many choices, which one is the best?
Big Data Analytics Use Cases
By industry
Finance
Identify trade patterns
Detect fraud and anomolies
Predict trading outcomes
Manufacturing
Simulations to improve design/yield
Detect production anomolies
Predict machine failures (sensor data)
Telecom
Behavioral analysis of customer calls
Network analysis (perf and reliability)
Healthcare
Find genetic profiles/matches
Analyze health vs spending
Predict viral oubreaks
Analytics Database requirements
Why this is different from OLTP
and why indexes are not helpful
What is an OLTP workload?
• OLTP applications are represents the most common database workload
• OLTP applications has a read / write ratio of maybe 50/50
– Web apps / E-commerce has more reads, ending with maybe 90/10
• OLTP applications deals with data on a row by row level
– Customer data, product data, order items etc.
– Single rows are selected, inserted, updated and deleted, one by one or in small groups
• OLTP data structures is somewhat of a representation of the business or the
applications that manage the data
– An order reference a customer, and order item is linked to an order
– Typically 3rd normal form or higher
– Sometimes individual aspects break the normal form, for performance reasons
• Transactions and ACID properties are required
The analytics workload
• Deals with data from a high level perspective
• Handles data in large groups of rows
– SELECTs data by date, customer location, product id etc.
– Data is loaded in batch or streamed in
– Data is mostly just INSERTed
• Dealing with individual data items is usually ineffective
• Data structures are optimized for analytics use and performance
• Data is sometimes purged, but just as often not
• Contains structured, semi-structured and sometimes unstructured data
• Data often comes from many different sources, internal and external
• Queries are ad-hoc, largely
• Transactions and ACID requirements are relaxed
Analytics database requirements
• Fast access to large amounts of data
• Scalable as data grows over time
– Analytics requirements increasing
– Regulatory requirements
– New data sources are added
• Load performance must be fast, scalable and predictable
• Data loading should be very flexible due to the different sources of data
– Some data loaded in batch, other is streamed
• Query performance also need to be scalable
• Data compression is a requirement
– Data size constraints, as well as read performance from disk
B-tree indexes
The good
B-tree indexes
The bad
• Well known technology
• Works with most types of data
• Scales reasonably well
• Really good for OLTP
transactional data
• Really bad for unbalanced data
• Index modifications can be really
slow
• Index modifications are largely single
threaded
• Slows down with the amount of data
• Really not scalable with large
amount of data
In summary, what do we need
• Something that can compress data A LOT
• Something that can be written to with fast and predictable performance
• Something that doesn't necessarily support transactions
– It doesn't hurt, but performance is so much more important
• Something that can support analytics queries
– Ad-hoc queries
– Aggregate queries
• Something that can scale as data grows
• Something that can still have a level of high availability
• Something that works with analytics tools, like Tableau, R etc.
The Solution
Distributed Column based storage
Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL
SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use
To the rescue – Column Based Storage
• Data is stored column by column
• Each column is stored in one or more extents
– Each extent is represented by 1 file
• Each extent is arranged in fixed size blocks
• Extents are compressed (using Snappy)
• Data is one of
– Fixed size (1, 2, 4 or 8 bytes)
– Dictionary based with a fixed size pointer
• Meta data is in an extent map
– Extent map is in memory
– Extent map contains meta data on each
extent, like min and max values
Table
Column1 Column N
Extent 1
(8MB~64MB
8 million rows)
Extent N
(8MB~64MB
8 million rows)
To the rescue – Distributed data processing
• Clients connect to a User Module
• The User Module optimizes and
controls the execution
• Data is distributed among the
Performance Modules
• Data is stored, processed and
managed by Performance Modules
• Performance Modules process
query primitives in parallel
• The User Module combines the
results from the Performance
Modules
User Modules
Performance
Module 1 ... Performance
Module N
Performance
Module 2
Performance
Module 3
Clients
User Connections
MariaDB Analytics
MariaDB ColumnStore and MariaDB AX
MariaDB ColumnStore
High performance columnar storage engine that supports a wide variety
of analytical use cases in highly scalable distributed environments
Parallel query
processing for distributed
environments
Faster, More
Efficient Queries
Single Interface for
OLTP and analytics
Easy to Manage and
Scale
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
MariaDB AX
MariaDB Server
MariaDB MaxScale
MariaDB ColumnStore
Parallel queries
Distributed storage
No indexes
Automatic partitioning
Read optimized
High compression
Low disk IO
ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
MariaDB MaxScale
MariaDB Server
ColumnStore
ColumnStore
Storage
MariaDB MaxScale
Easier Enterprise
Analytics
ANSI SQL
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for
data in motion , role based access and auditability
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window
function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
Faster, More
Efficient Queries
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
Parallel
Query Processing
MariaDB ColumnStore
Analytics Use Cases
Healthcare / Life Science Industry
Genome analysis
• In-depth genome research for the dairy industry to improve production of milk and protein.
• Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load)
Healthcare spending analysis
• Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data
• Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition
Why MariaDB ColumnStore
• Strong security features including role based data access and audit plug in
• MPP architecture handles analytics on big data with high speed
• Easy to analyze archived data with SQL based analytics
• Does not require DBA to index or partition data
Telecommunication Industry
Customer behavior analysis
• Analyze call data record to segment customers based on their behavior
• Data-driven analysis for customer satisfaction
• Create behavioral based upsell or cross-sell opportunity
Call data analysis
• Data size: 6TB
• Ingest 1.5 million rows of logs per day with 30million texts and 3million calls
• Call and network quality analysis
• Provide higher quality customer services based on data
Why MariaDB ColumnStore
• ColumnStore support time based partitioning and time-series analysis
• Fast data load for real-time analytics
• MPP architecture handles analytics on big data with high speed
• Easy to analyze the archived data with SQL based analytics
In Conclusion
• Analytics require a different technology to be able to cope with
– Different types of data
– Different types of data access
• OLTP databases has different requirements compared to Analytics
• Column Based storage allows high compression
• Metadata can replace indexing
• Distributed processing allows for performance and scalability
• MariaDB ColumnStore implement a fast an efficient distributed database for
analytics
• MariaDB AX is the subscription for professional use of MariaDB ColumnStore
• MariaDB ColumnStore is gaining wide acceptance
Thank you

More Related Content

What's hot (20)

PDF
MySQL Performance Schema in Action
Sveta Smirnova
 
PDF
MySQL Multi-Source Replication for PL2016
Wagner Bianchi
 
PDF
Parallel Replication in MySQL and MariaDB
Mydbops
 
PDF
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Jean-François Gagné
 
PDF
MySQL/MariaDB Proxy Software Test
I Goo Lee
 
PDF
MySQL Administrator 2021 - 네오클로바
NeoClova
 
PDF
MariaDB ColumnStore
MariaDB plc
 
PPTX
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
PDF
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
PDF
MySQL GTID Concepts, Implementation and troubleshooting
Mydbops
 
PDF
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Jean-François Gagné
 
PDF
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
PDF
ProxySQL High Avalability and Configuration Management Overview
René Cannaò
 
PPTX
Query logging with proxysql
YoungHeon (Roy) Kim
 
PDF
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
PPTX
Redis introduction
Federico Daniel Colombo Gennarelli
 
PPTX
ProxySQL for MySQL
Mydbops
 
PDF
Replication Troubleshooting in Classic VS GTID
Mydbops
 
PDF
ProxySQL High Availability (Clustering)
Mydbops
 
PDF
Upgrade from MySQL 5.7 to MySQL 8.0
Olivier DASINI
 
MySQL Performance Schema in Action
Sveta Smirnova
 
MySQL Multi-Source Replication for PL2016
Wagner Bianchi
 
Parallel Replication in MySQL and MariaDB
Mydbops
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Jean-François Gagné
 
MySQL/MariaDB Proxy Software Test
I Goo Lee
 
MySQL Administrator 2021 - 네오클로바
NeoClova
 
MariaDB ColumnStore
MariaDB plc
 
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
MySQL GTID Concepts, Implementation and troubleshooting
Mydbops
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Jean-François Gagné
 
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
ProxySQL High Avalability and Configuration Management Overview
René Cannaò
 
Query logging with proxysql
YoungHeon (Roy) Kim
 
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
ProxySQL for MySQL
Mydbops
 
Replication Troubleshooting in Classic VS GTID
Mydbops
 
ProxySQL High Availability (Clustering)
Mydbops
 
Upgrade from MySQL 5.7 to MySQL 8.0
Olivier DASINI
 

Similar to MariaDB AX: Analytics with MariaDB ColumnStore (20)

PDF
Fast, Powerful and Scalable Analytics
MariaDB plc
 
PPTX
Delivering fast, powerful and scalable analytics
MariaDB plc
 
PDF
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
PDF
Delivering fast, powerful and scalable analytics
MariaDB plc
 
PDF
Big Data Analytics with MariaDB ColumnStore
MariaDB plc
 
PDF
Exploring modern analytics use cases
MariaDB plc
 
PDF
Improving Transactional Applications with Analytics
DATAVERSITY
 
PDF
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
Insight Technology, Inc.
 
PDF
Operational-Analytics
Niloy Mukherjee
 
PDF
Open Source für den geschäftskritischen Einsatz
MariaDB plc
 
PDF
Data Con LA 2018 - Why use a columnar database for analytical workloads by Sh...
Data Con LA
 
PDF
Introduction of MariaDB AX / TX
GOTO Satoru
 
PPTX
Modernizing Mission-Critical Apps with SQL Server
Microsoft Tech Community
 
PDF
How Columnar Databases Support Modern Analytics
DATAVERSITY
 
PDF
Latest trends in database management
BcomBT
 
PDF
MariaDB today and our vision for the future
MariaDB plc
 
PDF
MariaDB today and our vision for the future
MariaDB plc
 
PDF
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
PDF
Transactional and Analytics together: MariaDB and ColumnStore
mlraviol
 
PDF
What to expect from MariaDB Platform X5, part 2
MariaDB plc
 
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Delivering fast, powerful and scalable analytics
MariaDB plc
 
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
Delivering fast, powerful and scalable analytics
MariaDB plc
 
Big Data Analytics with MariaDB ColumnStore
MariaDB plc
 
Exploring modern analytics use cases
MariaDB plc
 
Improving Transactional Applications with Analytics
DATAVERSITY
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
Insight Technology, Inc.
 
Operational-Analytics
Niloy Mukherjee
 
Open Source für den geschäftskritischen Einsatz
MariaDB plc
 
Data Con LA 2018 - Why use a columnar database for analytical workloads by Sh...
Data Con LA
 
Introduction of MariaDB AX / TX
GOTO Satoru
 
Modernizing Mission-Critical Apps with SQL Server
Microsoft Tech Community
 
How Columnar Databases Support Modern Analytics
DATAVERSITY
 
Latest trends in database management
BcomBT
 
MariaDB today and our vision for the future
MariaDB plc
 
MariaDB today and our vision for the future
MariaDB plc
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
Transactional and Analytics together: MariaDB and ColumnStore
mlraviol
 
What to expect from MariaDB Platform X5, part 2
MariaDB plc
 
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
PDF
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
PDF
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
PDF
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
PDF
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
PDF
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
PDF
Introducing workload analysis
MariaDB plc
 
PDF
Under the hood: SkySQL monitoring
MariaDB plc
 
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB plc
 
MariaDB München Roadshow - 24 September, 2024
MariaDB plc
 
MariaDB Paris Roadshow - 19 September 2024
MariaDB plc
 
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
MariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
MariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
MariaDB plc
 
Introducing workload analysis
MariaDB plc
 
Under the hood: SkySQL monitoring
MariaDB plc
 
Ad

Recently uploaded (20)

PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 

MariaDB AX: Analytics with MariaDB ColumnStore

  • 2. Agenda • The Task - Analytics – Why and what • The Requirements – What do we need for analytics • The Solution – Column Based Storage • The Product – MariaDB AX and MariaDB ColumnStore • The Uses – MariaDB ColumnStore in action
  • 3. Why Analytics and what do you get A high level view on analytics
  • 4. Why Analytics ? • Get the most value of your data asset • Faster Better decision making process • Cost reduction • New products and services
  • 5. What is likely to happen? Why is it happening? Types of analytics What is happening? What should I do about it?
  • 6. Descriptive: What happened ? ● Reports ○ Sales Report ○ Expense summary ● Ad-hoc requests to analyst
  • 7. Diagnostics: Why did it happen • Aggregates: aggregate measure over one or more dimension – Find total sales – Top five product ranked by sales • Roll-ups: Aggregate at different levels of dimension hierarchy – given total sales by city, roll-up to get sales by state • Drill-down: Inverse of roll-ups – given total sales by state, drill-down to get total by city • Slicing and Dicing: – Equality and range selections on one or more dimensions
  • 8. Predictive: What is likely to happen • Sales Prediction – Analyze data to identify trends, spot weakness or determine conditions among broader data sets for making decisions about the future • Targeted marketing – what is likelihood of a customer buying a particular product based on past buying behavior
  • 9. Real World Example - Visualization
  • 10. Prescriptive: What is the best course of action? Paradox of choices With too many choices, which one is the best?
  • 11. Big Data Analytics Use Cases By industry Finance Identify trade patterns Detect fraud and anomolies Predict trading outcomes Manufacturing Simulations to improve design/yield Detect production anomolies Predict machine failures (sensor data) Telecom Behavioral analysis of customer calls Network analysis (perf and reliability) Healthcare Find genetic profiles/matches Analyze health vs spending Predict viral oubreaks
  • 12. Analytics Database requirements Why this is different from OLTP and why indexes are not helpful
  • 13. What is an OLTP workload? • OLTP applications are represents the most common database workload • OLTP applications has a read / write ratio of maybe 50/50 – Web apps / E-commerce has more reads, ending with maybe 90/10 • OLTP applications deals with data on a row by row level – Customer data, product data, order items etc. – Single rows are selected, inserted, updated and deleted, one by one or in small groups • OLTP data structures is somewhat of a representation of the business or the applications that manage the data – An order reference a customer, and order item is linked to an order – Typically 3rd normal form or higher – Sometimes individual aspects break the normal form, for performance reasons • Transactions and ACID properties are required
  • 14. The analytics workload • Deals with data from a high level perspective • Handles data in large groups of rows – SELECTs data by date, customer location, product id etc. – Data is loaded in batch or streamed in – Data is mostly just INSERTed • Dealing with individual data items is usually ineffective • Data structures are optimized for analytics use and performance • Data is sometimes purged, but just as often not • Contains structured, semi-structured and sometimes unstructured data • Data often comes from many different sources, internal and external • Queries are ad-hoc, largely • Transactions and ACID requirements are relaxed
  • 15. Analytics database requirements • Fast access to large amounts of data • Scalable as data grows over time – Analytics requirements increasing – Regulatory requirements – New data sources are added • Load performance must be fast, scalable and predictable • Data loading should be very flexible due to the different sources of data – Some data loaded in batch, other is streamed • Query performance also need to be scalable • Data compression is a requirement – Data size constraints, as well as read performance from disk
  • 16. B-tree indexes The good B-tree indexes The bad • Well known technology • Works with most types of data • Scales reasonably well • Really good for OLTP transactional data • Really bad for unbalanced data • Index modifications can be really slow • Index modifications are largely single threaded • Slows down with the amount of data • Really not scalable with large amount of data
  • 17. In summary, what do we need • Something that can compress data A LOT • Something that can be written to with fast and predictable performance • Something that doesn't necessarily support transactions – It doesn't hurt, but performance is so much more important • Something that can support analytics queries – Ad-hoc queries – Aggregate queries • Something that can scale as data grows • Something that can still have a level of high availability • Something that works with analytics tools, like Tableau, R etc.
  • 19. Existing Approaches Limited real time analytics Slow releases of product innovation Expensive hardware and software Data Warehouses Hadoop / NoSQL LIMITED SQL SUPPORT DIFFICULT TO INSTALL/MANAGE LIMITED TALENT POOL DATA LAKE W/ NO DATA MANAGEMENT Hard to use
  • 20. To the rescue – Column Based Storage • Data is stored column by column • Each column is stored in one or more extents – Each extent is represented by 1 file • Each extent is arranged in fixed size blocks • Extents are compressed (using Snappy) • Data is one of – Fixed size (1, 2, 4 or 8 bytes) – Dictionary based with a fixed size pointer • Meta data is in an extent map – Extent map is in memory – Extent map contains meta data on each extent, like min and max values Table Column1 Column N Extent 1 (8MB~64MB 8 million rows) Extent N (8MB~64MB 8 million rows)
  • 21. To the rescue – Distributed data processing • Clients connect to a User Module • The User Module optimizes and controls the execution • Data is distributed among the Performance Modules • Data is stored, processed and managed by Performance Modules • Performance Modules process query primitives in parallel • The User Module combines the results from the Performance Modules User Modules Performance Module 1 ... Performance Module N Performance Module 2 Performance Module 3 Clients User Connections
  • 23. MariaDB ColumnStore High performance columnar storage engine that supports a wide variety of analytical use cases in highly scalable distributed environments Parallel query processing for distributed environments Faster, More Efficient Queries Single Interface for OLTP and analytics Easy to Manage and Scale Easier Enterprise Analytics Power of SQL and Freedom of Open Source to Big Data Analytics Better Price Performance
  • 24. MariaDB AX MariaDB Server MariaDB MaxScale MariaDB ColumnStore Parallel queries Distributed storage No indexes Automatic partitioning Read optimized High compression Low disk IO ColumnStore Storage ColumnStore Storage ColumnStore Storage MariaDB Server ColumnStore MariaDB Server ColumnStore MariaDB MaxScale MariaDB Server ColumnStore ColumnStore Storage MariaDB MaxScale
  • 25. Easier Enterprise Analytics ANSI SQL Single SQL Front-end • Use a single SQL interface for analytics and OLTP • Leverage MariaDB Security features - Encryption for data in motion , role based access and auditability Full ANSI SQL • No more SQL “like” query • Support complex join, aggregation and window function Easy to manage and scale • Eliminate needs for indexes and views • Automated horizontal/vertical partitioning • Linear scalable by adding new nodes as data grows • Out of box connection with BI tools
  • 26. Faster, More Efficient Queries Optimized for Columnar storage • Columnar storage reduces disk I/O • Blazing fast read-intensive workload • Ultra fast data import Parallel distributed query execution • Distributed queries into series of parallel operations • Fully parallel high speed data ingestion Highly available analytic environment • Built-in Redundancy • Automatic fail-over Parallel Query Processing
  • 28. Healthcare / Life Science Industry Genome analysis • In-depth genome research for the dairy industry to improve production of milk and protein. • Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load) Healthcare spending analysis • Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data • Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition Why MariaDB ColumnStore • Strong security features including role based data access and audit plug in • MPP architecture handles analytics on big data with high speed • Easy to analyze archived data with SQL based analytics • Does not require DBA to index or partition data
  • 29. Telecommunication Industry Customer behavior analysis • Analyze call data record to segment customers based on their behavior • Data-driven analysis for customer satisfaction • Create behavioral based upsell or cross-sell opportunity Call data analysis • Data size: 6TB • Ingest 1.5 million rows of logs per day with 30million texts and 3million calls • Call and network quality analysis • Provide higher quality customer services based on data Why MariaDB ColumnStore • ColumnStore support time based partitioning and time-series analysis • Fast data load for real-time analytics • MPP architecture handles analytics on big data with high speed • Easy to analyze the archived data with SQL based analytics
  • 30. In Conclusion • Analytics require a different technology to be able to cope with – Different types of data – Different types of data access • OLTP databases has different requirements compared to Analytics • Column Based storage allows high compression • Metadata can replace indexing • Distributed processing allows for performance and scalability • MariaDB ColumnStore implement a fast an efficient distributed database for analytics • MariaDB AX is the subscription for professional use of MariaDB ColumnStore • MariaDB ColumnStore is gaining wide acceptance