SlideShare a Scribd company logo
Making MySQL Great for Business Intelligence Robin Schumacher VP Products Calpont
Agenda Quick overview of BI Looking at the right technology foundation General physical MySQL design decisions that impact success A look at row vs. column MySQL databases Conclusions
A Quick Overview of Business Intelligence
What is Business Intelligence? Business Intelligence (BI)  refers to skills, processes, technologies, applications and practices used to support decision making. BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive analytics.
Why Business Intelligence? All companies now recognize the need for BI Information is a weapon that both large and small companies use to better understand their customer, competitors, and marketplace Making poorly informed decisions can be disastrous
Overview of Most BI Frameworks OLTP Files/XML Log Files Operational Source Data Staging  or ODS ETL Final  ETL Reporting, BI, Notification Layer Ad-Hoc Dashboards Reports Notifications Users Staging Area Data Warehouse Warehouse Archive Purge/Archive Data Warehouse and Metadata Management
Simple Reporting Databases OLTP Database Read Shard One Reporting Database Application Servers End Users ETL Data Archiving Link Replication
Building the Right Technical Foundation
What is the Key Component for Success? In other words, what you do with your MySQL Server – in terms of physical design, schema design, and performance design – will be the biggest factor on whether a BI system hits the mark… * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.  *
What Technology Decisions are Being Made? * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.  *
What General MySQL Design Decisions Help Success?
First – Get/Use a Modeling Tool
Horizontal Partitioning Model
Read Sharding / Horizontal Partitioning
Vertical Partitioning Model
General List of Top BI Design Decisions Storage Engine Selection Physical Table/Index Partitioning Indexing Creation and Placement Set proper amounts for memory caches, etc. Row vs. Column Engine / Database
Core BI Features for MySQL No practical storage limits (1 tablespace=110TB) Automatic storage management ANSI-SQL support for all datatypes (including BLOB and XML) Data/Index partitioning (range, hash, key, list, composite) Built-in Replication Main memory tables (for dimension tables) Variety of indexes (b-tree, fulltext, clustered, hash, GIS) Multiple-configurable data/index caches Pre-loading of index data into index caches Unique query cache (caches result set + query; not just data) Parallel data load (5.1 and higher – multiple files) Multi-insert DML Data compression (depends on engine)  Read-only tables Fast connection pooling Cost-based optimizer  Wide platform support
Storage Engines Internal to MySQL MyISAM Archive Memory CSV High-speed query/insert engine Non-transactional, table locking Good for data marts, small warehouses Compresses data by up to 80% Fastest for data loads Only allows inserts/selects Good for seldom accessed data Main memory tables Good for small dimension tables B-tree and hash indexes Comma separated values Allows both flat file access and editing as well as SQL query/DML Allows instantaneous data loads Also:Merge for pre-5.1 partitioning
Partitioning and Performance (5.1+) mysql> CREATE TABLE part_tab ->  (  c1 int ,c2 varchar(30) ,c3 date ) ->  PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995), ->  PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) , ->  PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) , ->  PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) , ->  PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) , ->  PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010), ->  PARTITION p11 VALUES LESS THAN MAXVALUE ); mysql> create table no_part_tab  (c1 int,c2 varchar(30), c3 date); *** Load 8 million rows of data into each table *** mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ |  795181 | +----------+ 1 row in set (38.30 sec) mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ |  795181 | +----------+ 1 row in set (3.88 sec) 90%  Response Time Reduction
Index Creation and Placement If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.)  Remember some storage engines don’t support normal indexes (Archive, CSV)
Row vs. Column Engines / Databases
Column vs. Row Orientation  A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…
Why a Column Database? Column databases only read the columns needed to satisfy a query vs. full rows If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI Column databases (most of them…) remove the need for indexing because the column is the index Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc. As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s
Why a Column Database? &quot;If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-store DBMS,  it outperforms anything that doesn't take a column-store approach .&quot;   - Donald Feinberg, Gartner Group
Why Not a Column Database? If you routinely have SELECT * queries or queries that request the majority of columns in a table If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s
What is Calpont’s InfiniDB? InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one box or using commodity machines in a scale out configuration.  Scale up Scale Out
InfiniDB vs. a Leading Row RDBMS 2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
Percona’s Test of Column Databases 610 GB of raw data; 8 Core Machine http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
Calpont Solutions Calpont Analytic Database Server Editions Calpont Analytic Database Solutions InfiniDB  Community Server Column-Oriented Multi-threaded Terabyte Capable Single Server InfiniDB Enterprise Server Scale out / Parallel Processing Automatic Failover InfiniDB Enterprise Solution Monitoring 24x7 Support Auto Patch Management Alerts & SNMP Notifications Hot Fix Builds Consultative Help
InfiniDB Community & Enterprise Server Comparison Yes No Multi-Node, MPP scale out capable w/ failover Formal Production Support Forums Only Support Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes InfiniDB Community Yes INSERT/UPDATE/DELETE (DML) support Yes Transaction support (ACID compliant) Yes MySQL front end Yes Logical data compression Yes High-Speed bulk loader w/ no blocking queries while loading Yes Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Crash-recovery Yes Terabyte database capable Yes High concurrency supported Yes Alter Table with online add column capability  Yes MVCC support – snapshot read (readers don’t block writers) Yes Automatic vertical (column) and logical horizontal partitioning of data Yes No indexing necessary Yes Column-oriented InfiniDB Enterprise Core Database Server Features
For More Information Download InfiniDB Community Edition Download InfiniDB documentation Read InfiniDB technical white papers Read InfiniDB intro articles on MySQL dev zone Visit InfiniDB online forums Trial the InfiniDB Enterprise Edition: http://www.calpont.com www.infinidb.org www.calpont.com

More Related Content

What's hot (20)

PPTX
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
George Joseph
 
PPTX
Integrating hadoop - Big Data TechCon 2013
Jonathan Seidman
 
PPT
OLAP
Ashir Ali
 
PPTX
Oltp vs olap
Mr. Fmhyudin
 
PPTX
Data virtualization using polybase
Antonios Chatzipavlis
 
PPTX
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
PPTX
Introduction to PolyBase
James Serra
 
PDF
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
✔ Eric David Benari, PMP
 
PDF
In memory big data management and processing a survey
redpel dot com
 
PPTX
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 
PPTX
Big Data .. Are you ready for the next wave?
Mahmoud Sabri
 
PPT
OLAP technology
Dr. Mahendra Srivastava
 
PPT
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
DATAVERSITY
 
PPTX
Azure SQL Data Warehouse for beginners
Michaela Murray
 
PPTX
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
PDF
Introducing Azure SQL Data Warehouse
Grant Fritchey
 
PPTX
Challenges in building a Data Pipeline
Hevo Data Inc.
 
PPTX
Oracle: DW Design
DataminingTools Inc
 
PPTX
Hadoop and Hive in Enterprises
markgrover
 
PPT
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
George Joseph
 
Integrating hadoop - Big Data TechCon 2013
Jonathan Seidman
 
OLAP
Ashir Ali
 
Oltp vs olap
Mr. Fmhyudin
 
Data virtualization using polybase
Antonios Chatzipavlis
 
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Introduction to PolyBase
James Serra
 
MariaDB 10.2 & MariaDB 10.1 by Michael Monty Widenius at Database Camp 2016 @ UN
✔ Eric David Benari, PMP
 
In memory big data management and processing a survey
redpel dot com
 
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 
Big Data .. Are you ready for the next wave?
Mahmoud Sabri
 
OLAP technology
Dr. Mahendra Srivastava
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
DATAVERSITY
 
Azure SQL Data Warehouse for beginners
Michaela Murray
 
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
Introducing Azure SQL Data Warehouse
Grant Fritchey
 
Challenges in building a Data Pipeline
Hevo Data Inc.
 
Oracle: DW Design
DataminingTools Inc
 
Hadoop and Hive in Enterprises
markgrover
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 

Similar to Making MySQL Great For Business Intelligence (20)

PDF
Intro to column stores
Justin Swanhart
 
ODP
Mysql For Developers
Carol McDonald
 
PPT
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
PPTX
Performance By Design
Guy Harrison
 
PPTX
Handling Massive Writes
Liran Zelkha
 
PDF
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
Dave Stokes
 
PDF
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
PPTX
Modernizing Mission-Critical Apps with SQL Server
Microsoft Tech Community
 
PDF
MySQL 8 Server Optimization Swanseacon 2018
Dave Stokes
 
PPT
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
PPTX
cPanel now supports MySQL 8.0 - My Top Seven Features
Dave Stokes
 
PPTX
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
PPTX
Best storage engine for MySQL
tomflemingh2
 
ZIP
Practical MySQL
Indus Khaitan
 
PPTX
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
Dave Stokes
 
PPTX
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
Dave Stokes
 
PPTX
MySQL 8.0 Featured for Developers
Dave Stokes
 
PDF
Mysql features for the enterprise
Giuseppe Maxia
 
PPTX
MySQL
janova santhi
 
PDF
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
Intro to column stores
Justin Swanhart
 
Mysql For Developers
Carol McDonald
 
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
Performance By Design
Guy Harrison
 
Handling Massive Writes
Liran Zelkha
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
Dave Stokes
 
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
Modernizing Mission-Critical Apps with SQL Server
Microsoft Tech Community
 
MySQL 8 Server Optimization Swanseacon 2018
Dave Stokes
 
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
cPanel now supports MySQL 8.0 - My Top Seven Features
Dave Stokes
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
Best storage engine for MySQL
tomflemingh2
 
Practical MySQL
Indus Khaitan
 
MySQL 8 - UKOUG Techfest Brighton December 2nd, 2019
Dave Stokes
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
Dave Stokes
 
MySQL 8.0 Featured for Developers
Dave Stokes
 
Mysql features for the enterprise
Giuseppe Maxia
 
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
Ad

Recently uploaded (20)

PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Ad

Making MySQL Great For Business Intelligence

  • 1. Making MySQL Great for Business Intelligence Robin Schumacher VP Products Calpont
  • 2. Agenda Quick overview of BI Looking at the right technology foundation General physical MySQL design decisions that impact success A look at row vs. column MySQL databases Conclusions
  • 3. A Quick Overview of Business Intelligence
  • 4. What is Business Intelligence? Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making. BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive analytics.
  • 5. Why Business Intelligence? All companies now recognize the need for BI Information is a weapon that both large and small companies use to better understand their customer, competitors, and marketplace Making poorly informed decisions can be disastrous
  • 6. Overview of Most BI Frameworks OLTP Files/XML Log Files Operational Source Data Staging or ODS ETL Final ETL Reporting, BI, Notification Layer Ad-Hoc Dashboards Reports Notifications Users Staging Area Data Warehouse Warehouse Archive Purge/Archive Data Warehouse and Metadata Management
  • 7. Simple Reporting Databases OLTP Database Read Shard One Reporting Database Application Servers End Users ETL Data Archiving Link Replication
  • 8. Building the Right Technical Foundation
  • 9. What is the Key Component for Success? In other words, what you do with your MySQL Server – in terms of physical design, schema design, and performance design – will be the biggest factor on whether a BI system hits the mark… * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *
  • 10. What Technology Decisions are Being Made? * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *
  • 11. What General MySQL Design Decisions Help Success?
  • 12. First – Get/Use a Modeling Tool
  • 14. Read Sharding / Horizontal Partitioning
  • 16. General List of Top BI Design Decisions Storage Engine Selection Physical Table/Index Partitioning Indexing Creation and Placement Set proper amounts for memory caches, etc. Row vs. Column Engine / Database
  • 17. Core BI Features for MySQL No practical storage limits (1 tablespace=110TB) Automatic storage management ANSI-SQL support for all datatypes (including BLOB and XML) Data/Index partitioning (range, hash, key, list, composite) Built-in Replication Main memory tables (for dimension tables) Variety of indexes (b-tree, fulltext, clustered, hash, GIS) Multiple-configurable data/index caches Pre-loading of index data into index caches Unique query cache (caches result set + query; not just data) Parallel data load (5.1 and higher – multiple files) Multi-insert DML Data compression (depends on engine) Read-only tables Fast connection pooling Cost-based optimizer Wide platform support
  • 18. Storage Engines Internal to MySQL MyISAM Archive Memory CSV High-speed query/insert engine Non-transactional, table locking Good for data marts, small warehouses Compresses data by up to 80% Fastest for data loads Only allows inserts/selects Good for seldom accessed data Main memory tables Good for small dimension tables B-tree and hash indexes Comma separated values Allows both flat file access and editing as well as SQL query/DML Allows instantaneous data loads Also:Merge for pre-5.1 partitioning
  • 19. Partitioning and Performance (5.1+) mysql> CREATE TABLE part_tab -> ( c1 int ,c2 varchar(30) ,c3 date ) -> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995), -> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) , -> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) , -> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) , -> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) , -> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); mysql> create table no_part_tab (c1 int,c2 varchar(30), c3 date); *** Load 8 million rows of data into each table *** mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (38.30 sec) mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (3.88 sec) 90% Response Time Reduction
  • 20. Index Creation and Placement If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.) Remember some storage engines don’t support normal indexes (Archive, CSV)
  • 21. Row vs. Column Engines / Databases
  • 22. Column vs. Row Orientation A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…
  • 23. Why a Column Database? Column databases only read the columns needed to satisfy a query vs. full rows If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI Column databases (most of them…) remove the need for indexing because the column is the index Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc. As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s
  • 24. Why a Column Database? &quot;If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-store DBMS, it outperforms anything that doesn't take a column-store approach .&quot; - Donald Feinberg, Gartner Group
  • 25. Why Not a Column Database? If you routinely have SELECT * queries or queries that request the majority of columns in a table If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s
  • 26. What is Calpont’s InfiniDB? InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one box or using commodity machines in a scale out configuration. Scale up Scale Out
  • 27. InfiniDB vs. a Leading Row RDBMS 2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
  • 28. Percona’s Test of Column Databases 610 GB of raw data; 8 Core Machine http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
  • 29. Calpont Solutions Calpont Analytic Database Server Editions Calpont Analytic Database Solutions InfiniDB Community Server Column-Oriented Multi-threaded Terabyte Capable Single Server InfiniDB Enterprise Server Scale out / Parallel Processing Automatic Failover InfiniDB Enterprise Solution Monitoring 24x7 Support Auto Patch Management Alerts & SNMP Notifications Hot Fix Builds Consultative Help
  • 30. InfiniDB Community & Enterprise Server Comparison Yes No Multi-Node, MPP scale out capable w/ failover Formal Production Support Forums Only Support Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes InfiniDB Community Yes INSERT/UPDATE/DELETE (DML) support Yes Transaction support (ACID compliant) Yes MySQL front end Yes Logical data compression Yes High-Speed bulk loader w/ no blocking queries while loading Yes Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Crash-recovery Yes Terabyte database capable Yes High concurrency supported Yes Alter Table with online add column capability Yes MVCC support – snapshot read (readers don’t block writers) Yes Automatic vertical (column) and logical horizontal partitioning of data Yes No indexing necessary Yes Column-oriented InfiniDB Enterprise Core Database Server Features
  • 31. For More Information Download InfiniDB Community Edition Download InfiniDB documentation Read InfiniDB technical white papers Read InfiniDB intro articles on MySQL dev zone Visit InfiniDB online forums Trial the InfiniDB Enterprise Edition: http://www.calpont.com www.infinidb.org www.calpont.com