SlideShare a Scribd company logo
Ivan Zoratti
Big Data with MySQL
Percona Live Santa Clara 2013
V1304.01
Friday, 3 May 13
Who is Ivan
?
Friday, 3 May 13
SkySQL
•Leading provider of open source
databases, services and
solutions
•Home for the founders and the
original developers of the core
of MySQL
•The creators of MariaDB, the
drop-off, innovative
replacement of MySQL
Friday, 3 May 13
What is Big Data?
http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg
Friday, 3 May 13
PAGE
Big Data!
Big data is a collection of data
sets so large and complex that it
becomes difficult to process
using on-hand database
management tools or traditional
data processing applications.
5
http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg
Friday, 3 May 13
PAGE
Big Data By Structure
6
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Friday, 3 May 13
PAGE
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Big Data By Structure
7
Friday, 3 May 13
PAGE
How “Big” is Big Data?
•Data Factors
•Size
•Speed to collect/
generate
•Variety
•Resources
•Administrators
•Developers
•Infrastructure
•Growth
•Collection
•Processing
•Availability
•To whom?
•For how long?
•In which format?
•Aggregated
•Detailed
8
Friday, 3 May 13
PAGE
How to manage Big Data
•Collection - Storage -Archive
•Load - Transform -Analyze
•Access - Explore - Utilize
9
http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it
Friday, 3 May 13
Big Data with MySQL
http://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg
Friday, 3 May 13
PAGE
Technologies to
Use / Consider / Watch
•MyISAM and MyISAM compression
•InnoDB compression
•MySQL 5.6 Partitioning
•MariaDB Optimizer
•MariaDB Virtual & Dynamic
Columns
•Cassandra Storage Engine
•Connect Storage Engine
•Columnar Databases
•InfiniDB
•Infobright
•TokuDB Storage Engine
11
Friday, 3 May 13
PAGE
Columnar Databases
•Automatic compression
•Automatic column storage
•Data distribution
•Map/Reduce approach
•MPP / Parallel loading
•No indexes
•On public clouds, HW or SW
appliances
12
Friday, 3 May 13
PAGE
TokuDB
•Increased Performance
•Increased Compression
•Online administration
•No Index rebuild
13
Friday, 3 May 13
PAGE
MyISAM
•Static, dynamic and compressed
format
•Multiple key cache, CACHE INDEX
and LOAD INDEX
•Compressed tables
•Horizontal partitioning (manual)
•External locking
14
Friday, 3 May 13
PAGE
InnoDB/XtraDB
•Data Load
•Pre-order data
•Split data into chunks
•unique_checks = 0;
•foreign_key_checks = 0;
•sql_log_bin = 0;
•innodb_autoinc_lock_mode = 2;
•Compression and block size
•Persistent optimizer stats
•innodb_stats_persistent
•innodb_stats_auto_recalc
15
SET GLOBAL innodb_file_per_table = 1;
SET GLOBAL innodb_file_format = Barracuda;
CREATE TABLE t1
( c1 INT PRIMARY KEY,
c2 VARCHAR(255) )
ROW_FORMAT = COMPRESSED
KEY_BLOCK_SIZE = 8;
LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1;
Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)
Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0
LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2;
Query OK, 134217728 rows affected (25 min 20.75 sec)
Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0
Friday, 3 May 13
PAGE
Partitioning (MySQL 5.6)
•Partitioning Types
•RANGE, LIST, RANGE COLUMN,
HASH, LINEAR HASH, KEY LINEAR
KEY, sub-partitions
•Partition and lock pruning
•Use of INDEX and DATA
DIRECTORY
•PARTITIONADD, DROP,
REORGANIZE, COALESCE,
TRUNCATE, EXCHANGE,
REBUILD, OPTIMIZE, CHECK,
ANALYZE, REPAIR
16
CREATE TABLE t1 ( c1 INT, c2 DATE )
PARTITION BY RANGE( YEAR( c2 ) )
SUBPARTITION BY HASH ( TO_DAYS( c2 ) )
( PARTITION p0 VALUES LESS THAN (1990) (
SUBPARTITION s0
DATA DIRECTORY = '/disk0/data'
INDEX DIRECTORY = '/disk0/idx',
SUBPARTITION s1
DATA DIRECTORY = '/disk1/data'
INDEX DIRECTORY = '/disk1/idx' ),...
ALTER TABLE t1
EXCHANGE PARTITION p3 WITH TABLE t2;
-- Range and List partitions
ALTER TABLE t1 REORGANIZE PARTITION
p0,p1,p2,p3 INTO (
PARTITION m0 VALUES LESS THAN (1980),
PARTITION m1 VALUES LESS THAN (2000));
-- Hash and Key partitions
ALTER TABLE t1 COALESCE PARTITION 10;
ALTER TABLE t1 ADD PARTITION PARTITIONS 5;
Friday, 3 May 13
PAGE
MariaDB Optimizer
•Multi-Range Read (MRR)*
•Index Merge / Sort intersection
•Batch KeyAccess*
•Block hash join
•Cost-based choice of range vs.
index_merge
•ORDER BY ... LIMIT <limit>*
•MariaDB 10
•Subqueries
•Semi-join*
•Materialization*
•subquery cache
•LIMIT ... ROWS EXAMINED
<limit>
17
(*) - Available in MySQL 5.6
Friday, 3 May 13
PAGE
Virtual & Dynamic Columns
VIRTUAL COLUMNS
•For InnoDB, MyISAM andAria
•PERSISTENT (stored) or VIRTUAL
(generated)
18
CREATE TABLE t1 (
c1 INT NOT NULL,
c2 VARCHAR(32),
c3 INT AS
( c1 MOD 10 ) VIRTUAL,
c4 VARCHAR(5) AS
( LEFT(B,5) ) PERSISTENT);
DYNAMIC COLUMNS
•Implement a schemaless,
document store
•COLUMN_ CREATE,ADD, GET, LIST,
JSON, EXISTS, CHECK, DELETE
•Nested colums are allowed
•Main datatypes are allowed
•Max 1GB documents
CREATE TABLE assets (
item_name VARCHAR(32) PRIMARY KEY,
dynamic_cols BLOB );
INSERT INTO assets VALUES (
'MariaDB T-shirt',
COLUMN_CREATE( 'color', 'blue',
'size', 'XL' ) );
INSERT INTO assets VALUES (
'Thinkpad Laptop',
COLUMN_CREATE( 'color', 'black',
'price', 500 ) );
Friday, 3 May 13
PAGE
Cassandra Storage Engine
•Column Family == Table
•Rowkey, static and dynamic
columns allowed
•Batch key access support
SET cassandra_default_thrift_host =
'192.168.0.10'
CREATE TABLE cassandra_tbl (
rowkey INT PRIMARY KEY,
col1 VARCHAR(25),
col2 BIGINT,
dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes )
ENGINE = cassandra
KEYSPACE = 'cassandra_key_space'
COLUMN_FAMILY = 'column_family_name';
19
Friday, 3 May 13
PAGE
Connect Storage Engine
•Any file format as MySQLTABLE:
•ODBC
•Text, XML, *ML
•Excel,Access etc.
•MariaDB CREATE TABLE options
•Multi-file table
•TableAutocreation
•Condition push down
•Read/Write and Multi Storage Engine Join
•CREATE INDEX
20
CREATE TABLE handout
ENGINE = CONNECT
TABLE_TYPE = XML
FILE_NAME = 'handout.htm'
HEADER = yes OPTION_LIST =
'name = TABLE,
coltype = HTML,
attribute =
(border=1;cellpadding=5)';
Friday, 3 May 13
Starting Your Big Data Project
Friday, 3 May 13
PAGE
Why would you use MySQL?
• Time
• Knowledge
• Infrastructure
• Costs
• Simplified Integration
• Not so “big” data
22
Friday, 3 May 13
PAGE
Apache Hadoop & Friends
23
HDFS
MapReduce
PIG HIVE
HCatalog
HBASE
ZooKeeper
•Mahout
•Ambari, Ganglia,
Nagios
•Sqoop
•Cascading
•Oozie
•Flume
•Protobuf, Avro,
Thrift
•Fuse-DFS
•Chukwa
•Cassandra
Friday, 3 May 13
PAGE
MySQL & Friends
24
MySQL/MariaDB/Storage Engines
SQL Optimizer
Scripts
Stored Procedures DML
DB Schema / DDL
MySQL/MariaDB
SkySQLDS
•Mahout
•SDS, Ganglia,
Nagios
•mysqlimport
•Cascading
•Talend, Pentaho
•Connect
Friday, 3 May 13
PAGE
Join us at the Solutions Day
•Cassandra and Connect Storage Engine
•Map/Reduce approach - Proxy optimisation
•Multiple protocols and more
25
Friday, 3 May 13
Thank You!
ivan@skysql.com
izoratti.blogspot.com
www.slideshare.net/izorattiwww.skysql.com
Friday, 3 May 13

More Related Content

What's hot (20)

PPTX
2-day-dba-oracle.pptx
Rocky572078
 
PPTX
CQL: This is not the SQL you are looking for.
Aaron Ploetz
 
PDF
[Pgday.Seoul 2020] SQL Tuning
PgDay.Seoul
 
PPTX
Oracle Goldengate training by Vipin Mishra
Vipin Mishra
 
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Jesmar Cannao'
 
PDF
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Databricks
 
PPTX
Indexing the MySQL Index: Key to performance tuning
OSSCube
 
PPTX
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
PDF
Let’s get to know Snowflake
Knoldus Inc.
 
PPT
DB2 for z/O S Data Sharing
Surekha Parekh
 
PDF
Oracle SQL Basics
Dhananjay Goel
 
PDF
Oracle RDBMS architecture
Martin Berger
 
PPTX
Sharding
MongoDB
 
PDF
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
PDF
Indexes in postgres
Louise Grandjonc
 
PDF
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
PDF
Overview of Databases and Data Modelling-1.pdf
Christalin Nelson
 
PDF
Spark SQL
Joud Khattab
 
DOC
SQL
Shunya Ram
 
DOCX
Rac questions
parvezsigan
 
2-day-dba-oracle.pptx
Rocky572078
 
CQL: This is not the SQL you are looking for.
Aaron Ploetz
 
[Pgday.Seoul 2020] SQL Tuning
PgDay.Seoul
 
Oracle Goldengate training by Vipin Mishra
Vipin Mishra
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Jesmar Cannao'
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Databricks
 
Indexing the MySQL Index: Key to performance tuning
OSSCube
 
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Let’s get to know Snowflake
Knoldus Inc.
 
DB2 for z/O S Data Sharing
Surekha Parekh
 
Oracle SQL Basics
Dhananjay Goel
 
Oracle RDBMS architecture
Martin Berger
 
Sharding
MongoDB
 
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
Indexes in postgres
Louise Grandjonc
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
Overview of Databases and Data Modelling-1.pdf
Christalin Nelson
 
Spark SQL
Joud Khattab
 
Rac questions
parvezsigan
 

Similar to Big Data with MySQL (20)

PDF
What can we learn from NoSQL technologies?
Ivan Zoratti
 
ODP
Mysql For Developers
Carol McDonald
 
PDF
Intro to column stores
Justin Swanhart
 
PDF
iloug2015.Mysql.for.oracle.dba.V2
Baruch Osoveskiy
 
PPT
Data Warehouse Logical Design using Mysql
HAFIZ Islam
 
PDF
Sql no sql
Dave Stokes
 
PPTX
MySQL
janova santhi
 
PDF
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
PDF
Performance Tuning Best Practices
webhostingguy
 
PDF
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
PDF
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
PDF
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
MariaDB plc
 
PDF
MySQL Storage Engines Landscape
Colin Charles
 
PPTX
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
PDF
Mysql database basic user guide
PoguttuezhiniVP
 
PDF
Emerging database landscape july 2011
navaidkhan
 
PPT
Making MySQL Great For Business Intelligence
Calpont
 
PPT
Star schema my sql
deathsubte
 
ODP
MySQL Scaling Presentation
Tommy Falgout
 
PDF
My First 100 days with a MySQL DBMS
Gustavo Rene Antunez
 
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Mysql For Developers
Carol McDonald
 
Intro to column stores
Justin Swanhart
 
iloug2015.Mysql.for.oracle.dba.V2
Baruch Osoveskiy
 
Data Warehouse Logical Design using Mysql
HAFIZ Islam
 
Sql no sql
Dave Stokes
 
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
Performance Tuning Best Practices
webhostingguy
 
Delivering fast, powerful and scalable analytics #OPEN18
Kangaroot
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
MariaDB plc
 
MySQL Storage Engines Landscape
Colin Charles
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
Mysql database basic user guide
PoguttuezhiniVP
 
Emerging database landscape july 2011
navaidkhan
 
Making MySQL Great For Business Intelligence
Calpont
 
Star schema my sql
deathsubte
 
MySQL Scaling Presentation
Tommy Falgout
 
My First 100 days with a MySQL DBMS
Gustavo Rene Antunez
 
Ad

More from Ivan Zoratti (20)

PDF
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
PDF
Introducing the Open Edge Module
Ivan Zoratti
 
PDF
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 
PDF
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
PDF
MariaDB ColumnStore - LONDON MySQL Meetup
Ivan Zoratti
 
PDF
ScaleDB Technical Presentation
Ivan Zoratti
 
PDF
Time Series From Collection To Analysis
Ivan Zoratti
 
PDF
ScaleDB Technical Presentation
Ivan Zoratti
 
PDF
MySQL for Beginners - part 1
Ivan Zoratti
 
PDF
Anatomy of a Proxy Server - MaxScale Internals
Ivan Zoratti
 
PDF
Orchestrating MySQL
Ivan Zoratti
 
PDF
GTIDs Explained
Ivan Zoratti
 
PDF
The Evolution of Open Source Databases
Ivan Zoratti
 
PDF
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
PDF
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
PDF
SkySQL & MariaDB What's all the buzz?
Ivan Zoratti
 
PDF
MySQL & MariaDB - Innovation Happens Here
Ivan Zoratti
 
PDF
Sky Is The limit
Ivan Zoratti
 
KEY
The sky's the limit
Ivan Zoratti
 
PDF
HA Reloaded
Ivan Zoratti
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Introducing the Open Edge Module
Ivan Zoratti
 
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
MariaDB ColumnStore - LONDON MySQL Meetup
Ivan Zoratti
 
ScaleDB Technical Presentation
Ivan Zoratti
 
Time Series From Collection To Analysis
Ivan Zoratti
 
ScaleDB Technical Presentation
Ivan Zoratti
 
MySQL for Beginners - part 1
Ivan Zoratti
 
Anatomy of a Proxy Server - MaxScale Internals
Ivan Zoratti
 
Orchestrating MySQL
Ivan Zoratti
 
GTIDs Explained
Ivan Zoratti
 
The Evolution of Open Source Databases
Ivan Zoratti
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
SkySQL & MariaDB What's all the buzz?
Ivan Zoratti
 
MySQL & MariaDB - Innovation Happens Here
Ivan Zoratti
 
Sky Is The limit
Ivan Zoratti
 
The sky's the limit
Ivan Zoratti
 
HA Reloaded
Ivan Zoratti
 
Ad

Recently uploaded (20)

PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 

Big Data with MySQL

  • 1. Ivan Zoratti Big Data with MySQL Percona Live Santa Clara 2013 V1304.01 Friday, 3 May 13
  • 3. SkySQL •Leading provider of open source databases, services and solutions •Home for the founders and the original developers of the core of MySQL •The creators of MariaDB, the drop-off, innovative replacement of MySQL Friday, 3 May 13
  • 4. What is Big Data? http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg Friday, 3 May 13
  • 5. PAGE Big Data! Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. 5 http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg Friday, 3 May 13
  • 6. PAGE Big Data By Structure 6 Unstructured •Store everything you have/you find •In any format and shape •You do not know how to use it, but it may come handy •Storing unstructured data is usually cheaper than storing it in a more structured datastore •Does not fit well in a relational database •Examples: •Text: Plain text, documents, web content, messages •Bitmap: Image, audio, video •Typical approach: •Mining, pattern recognition, tagging •Usually batch analysis Structured •Store only what you need •In a good format, ready to be used •You should already know how to use it, or at least what it means •Storing structured data is quite expensive •Raw data, indexing, denormalisation, aggregation •Arelational database is still the best choice •Examples: •Machine-Generated Data (MGD) •Tags, counters, sales •Typical approach: •BI tools, reporting •Real time analysis change data capture Friday, 3 May 13
  • 7. PAGE Unstructured •Store everything you have/you find •In any format and shape •You do not know how to use it, but it may come handy •Storing unstructured data is usually cheaper than storing it in a more structured datastore •Does not fit well in a relational database •Examples: •Text: Plain text, documents, web content, messages •Bitmap: Image, audio, video •Typical approach: •Mining, pattern recognition, tagging •Usually batch analysis Structured •Store only what you need •In a good format, ready to be used •You should already know how to use it, or at least what it means •Storing structured data is quite expensive •Raw data, indexing, denormalisation, aggregation •Arelational database is still the best choice •Examples: •Machine-Generated Data (MGD) •Tags, counters, sales •Typical approach: •BI tools, reporting •Real time analysis change data capture Big Data By Structure 7 Friday, 3 May 13
  • 8. PAGE How “Big” is Big Data? •Data Factors •Size •Speed to collect/ generate •Variety •Resources •Administrators •Developers •Infrastructure •Growth •Collection •Processing •Availability •To whom? •For how long? •In which format? •Aggregated •Detailed 8 Friday, 3 May 13
  • 9. PAGE How to manage Big Data •Collection - Storage -Archive •Load - Transform -Analyze •Access - Explore - Utilize 9 http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it Friday, 3 May 13
  • 10. Big Data with MySQL http://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg Friday, 3 May 13
  • 11. PAGE Technologies to Use / Consider / Watch •MyISAM and MyISAM compression •InnoDB compression •MySQL 5.6 Partitioning •MariaDB Optimizer •MariaDB Virtual & Dynamic Columns •Cassandra Storage Engine •Connect Storage Engine •Columnar Databases •InfiniDB •Infobright •TokuDB Storage Engine 11 Friday, 3 May 13
  • 12. PAGE Columnar Databases •Automatic compression •Automatic column storage •Data distribution •Map/Reduce approach •MPP / Parallel loading •No indexes •On public clouds, HW or SW appliances 12 Friday, 3 May 13
  • 13. PAGE TokuDB •Increased Performance •Increased Compression •Online administration •No Index rebuild 13 Friday, 3 May 13
  • 14. PAGE MyISAM •Static, dynamic and compressed format •Multiple key cache, CACHE INDEX and LOAD INDEX •Compressed tables •Horizontal partitioning (manual) •External locking 14 Friday, 3 May 13
  • 15. PAGE InnoDB/XtraDB •Data Load •Pre-order data •Split data into chunks •unique_checks = 0; •foreign_key_checks = 0; •sql_log_bin = 0; •innodb_autoinc_lock_mode = 2; •Compression and block size •Persistent optimizer stats •innodb_stats_persistent •innodb_stats_auto_recalc 15 SET GLOBAL innodb_file_per_table = 1; SET GLOBAL innodb_file_format = Barracuda; CREATE TABLE t1 ( c1 INT PRIMARY KEY, c2 VARCHAR(255) ) ROW_FORMAT = COMPRESSED KEY_BLOCK_SIZE = 8; LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1; Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec) Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0 LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2; Query OK, 134217728 rows affected (25 min 20.75 sec) Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0 Friday, 3 May 13
  • 16. PAGE Partitioning (MySQL 5.6) •Partitioning Types •RANGE, LIST, RANGE COLUMN, HASH, LINEAR HASH, KEY LINEAR KEY, sub-partitions •Partition and lock pruning •Use of INDEX and DATA DIRECTORY •PARTITIONADD, DROP, REORGANIZE, COALESCE, TRUNCATE, EXCHANGE, REBUILD, OPTIMIZE, CHECK, ANALYZE, REPAIR 16 CREATE TABLE t1 ( c1 INT, c2 DATE ) PARTITION BY RANGE( YEAR( c2 ) ) SUBPARTITION BY HASH ( TO_DAYS( c2 ) ) ( PARTITION p0 VALUES LESS THAN (1990) ( SUBPARTITION s0 DATA DIRECTORY = '/disk0/data' INDEX DIRECTORY = '/disk0/idx', SUBPARTITION s1 DATA DIRECTORY = '/disk1/data' INDEX DIRECTORY = '/disk1/idx' ),... ALTER TABLE t1 EXCHANGE PARTITION p3 WITH TABLE t2; -- Range and List partitions ALTER TABLE t1 REORGANIZE PARTITION p0,p1,p2,p3 INTO ( PARTITION m0 VALUES LESS THAN (1980), PARTITION m1 VALUES LESS THAN (2000)); -- Hash and Key partitions ALTER TABLE t1 COALESCE PARTITION 10; ALTER TABLE t1 ADD PARTITION PARTITIONS 5; Friday, 3 May 13
  • 17. PAGE MariaDB Optimizer •Multi-Range Read (MRR)* •Index Merge / Sort intersection •Batch KeyAccess* •Block hash join •Cost-based choice of range vs. index_merge •ORDER BY ... LIMIT <limit>* •MariaDB 10 •Subqueries •Semi-join* •Materialization* •subquery cache •LIMIT ... ROWS EXAMINED <limit> 17 (*) - Available in MySQL 5.6 Friday, 3 May 13
  • 18. PAGE Virtual & Dynamic Columns VIRTUAL COLUMNS •For InnoDB, MyISAM andAria •PERSISTENT (stored) or VIRTUAL (generated) 18 CREATE TABLE t1 ( c1 INT NOT NULL, c2 VARCHAR(32), c3 INT AS ( c1 MOD 10 ) VIRTUAL, c4 VARCHAR(5) AS ( LEFT(B,5) ) PERSISTENT); DYNAMIC COLUMNS •Implement a schemaless, document store •COLUMN_ CREATE,ADD, GET, LIST, JSON, EXISTS, CHECK, DELETE •Nested colums are allowed •Main datatypes are allowed •Max 1GB documents CREATE TABLE assets ( item_name VARCHAR(32) PRIMARY KEY, dynamic_cols BLOB ); INSERT INTO assets VALUES ( 'MariaDB T-shirt', COLUMN_CREATE( 'color', 'blue', 'size', 'XL' ) ); INSERT INTO assets VALUES ( 'Thinkpad Laptop', COLUMN_CREATE( 'color', 'black', 'price', 500 ) ); Friday, 3 May 13
  • 19. PAGE Cassandra Storage Engine •Column Family == Table •Rowkey, static and dynamic columns allowed •Batch key access support SET cassandra_default_thrift_host = '192.168.0.10' CREATE TABLE cassandra_tbl ( rowkey INT PRIMARY KEY, col1 VARCHAR(25), col2 BIGINT, dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes ) ENGINE = cassandra KEYSPACE = 'cassandra_key_space' COLUMN_FAMILY = 'column_family_name'; 19 Friday, 3 May 13
  • 20. PAGE Connect Storage Engine •Any file format as MySQLTABLE: •ODBC •Text, XML, *ML •Excel,Access etc. •MariaDB CREATE TABLE options •Multi-file table •TableAutocreation •Condition push down •Read/Write and Multi Storage Engine Join •CREATE INDEX 20 CREATE TABLE handout ENGINE = CONNECT TABLE_TYPE = XML FILE_NAME = 'handout.htm' HEADER = yes OPTION_LIST = 'name = TABLE, coltype = HTML, attribute = (border=1;cellpadding=5)'; Friday, 3 May 13
  • 21. Starting Your Big Data Project Friday, 3 May 13
  • 22. PAGE Why would you use MySQL? • Time • Knowledge • Infrastructure • Costs • Simplified Integration • Not so “big” data 22 Friday, 3 May 13
  • 23. PAGE Apache Hadoop & Friends 23 HDFS MapReduce PIG HIVE HCatalog HBASE ZooKeeper •Mahout •Ambari, Ganglia, Nagios •Sqoop •Cascading •Oozie •Flume •Protobuf, Avro, Thrift •Fuse-DFS •Chukwa •Cassandra Friday, 3 May 13
  • 24. PAGE MySQL & Friends 24 MySQL/MariaDB/Storage Engines SQL Optimizer Scripts Stored Procedures DML DB Schema / DDL MySQL/MariaDB SkySQLDS •Mahout •SDS, Ganglia, Nagios •mysqlimport •Cascading •Talend, Pentaho •Connect Friday, 3 May 13
  • 25. PAGE Join us at the Solutions Day •Cassandra and Connect Storage Engine •Map/Reduce approach - Proxy optimisation •Multiple protocols and more 25 Friday, 3 May 13