SlideShare a Scribd company logo
#CASSANDRA13
Colin Charles | colin@mariadb.org | SkySQL Ab | http://mariadb.org/
@bytebot on Twitter | http://bytebot.net/blog/
MariaDB and Cassandra Interoperability
#CASSANDRA13
whoami
*Work on MariaDB today
*Formerly of MySQL AB (acquired Sun Microsystems)
*Worked on The Fedora Project & OpenOffice.org previously
*Monty Program Ab is a major sponsor of MariaDB
*SkySQL & Monty Program Ab merge
*MariaDB governed by MariaDB Foundation
#CASSANDRA13
What we will discuss today...
*What is MariaDB?
*MariaDB Architecture
*The Cassandra Storage Engine (CassandraSE)
*Data & Command Mapping
*Use Cases
*Benchmarks
*Conclusions
#CASSANDRA13
What is MariaDB?
*Community developed, feature enhanced, backward compatible MySQL
*Drop-in replacement to MySQL
*Shipped in many Linux distributions as a default
*Enhanced features: threadpool, table elimination, optimizer changes
(subqueries materialize!), group commit in the replication binary log,
HandlerSocket, SphinxSE, multi-source replication, dynamic columns
#CASSANDRA13
#CASSANDRA13
MariaDB/MySQL and NoSQL
*HandlerSocket
*memcached access to InnoDB
*Hadoop Applier
*LevelDB Storage Engine
*Cassandra Storage Engine
#CASSANDRA13
Dynamic Columns
*Store a different set of columns for every row in the table
*Basically a blob with handling functions (GET, CREATE, ADD, DELETE,
EXISTS, LIST, JSON)
*Dynamic columns can be nested
*You can request rows in JSON format
*You can now name dynamic columns as well
INSERT INTO tbl SET
dyncol_blob=COLUMN_CREATE("column_name", "value");
#CASSANDRA13
Cassandra background
*Distributed key/value store (limited range scan support),
optionally flexible schema (pre-defined “static” columns,
ad-hoc dynamic columns), automatic sharding/
replication, eventual consistency
*Column families are like “tables”
*Row key -> column mapping
*Supercolumns are not supported
#CASSANDRA13
CQL at work
cqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, data1 varchar, data2 bigint ) with compactstorage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
#CASSANDRA13
CQL
*Looks like SQL at first glance
*No joins or subqueries
*No GROUP BY, ORDER BY must be able to use available indexes
*WHERE clause must represent an index lookup
*Simple goal of the Cassandra Storage Engine? Provide a “view” of
Cassandra’s data from MariaDB
#CASSANDRA13
Getting started
*Get MariaDB 10.0.3 from https://downloads.mariadb.org/
*Load the Cassandra plugin
- From SQL:
MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
- Or start it from my.cnf
[mysqld]
...
plugin-load=ha_cassandra.so
#CASSANDRA13
Is everything ok?
*Check to see that it is loaded - SHOW PLUGINS
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
#CASSANDRA13
Create an SQL table which is a view of a column family
MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
*thrift_host can be set per-table
*@@cassandra_default_thrift_host allows to re-point the table to different node
dynamically, and not change table DDL when Cassandra IP changes
#CASSANDRA13
Potential issues
*SELinux/AuditD blocks the connection
ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1]
*Disable SELinux: echo 0 > /selinux/enforce | service auditd stop
*Cassandra 1.2 with Column Families (CFs) without “COMPACT
STORAGE” attribute (pre-CQL3)
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in
keyspace mariadbtest
*Thrift based-clients no longer work, broke Pig as well (https://
issues.apache.org/jira/browse/CASSANDRA-5234); we’ll update this
soon
#CASSANDRA13
Accessing Cassandra data from MariaDB
*Get data from Cassandra
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
*Insert data into Cassandra
MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
*Ensure Cassandra sees inserted data
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
#CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
#CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
*Table must have a primary key
- name/type must match Cassandra’s rowkey
#CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
*Table must have a primary key
- name/type must match Cassandra’s rowkey
*Columns map to Cassandra’s static columns
- name must be same as in Cassandra, datatypes must match, can be subset of CF’s columns
#CASSANDRA13
Datatype mapping
Cassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second), TIMESTAMP(6) (microsecond), BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
#CASSANDRA13
Dynamic columns revisited
*Cassandra supports “dynamic column families”, can access ad-hoc
columns
create table tbl
(
rowkey type PRIMARY KEY
column1 type,
...
dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes
) engine=cassandra keyspace=... column_family=...;
insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2'));
select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
#CASSANDRA13
All data mapping is safe
*CassandraSE will refuse incorrect mappings (throw errors)
create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type'
#CASSANDRA13
Command Mapping
*Cassandra commands
- PUT (upsert)
- GET (performs a scan)
- DELETE (if exists)
*SQL commands
- SELECT -> GET/Scan
- INSERT -> PUT (upsert)
- UPDATE/DELETE -> read/write
#CASSANDRA13
SELECT command mapping
*MariaDB has a SQL interpreter
*CassandraSE supports lookups and scans
*Can now do:
- arbitrary WHERE clauses
- JOINs between Cassandra tables and MariaDB tables (BKA
supported)
#CASSANDRA13
Batched Key Access is fast!
select max(l_extendedprice) from orders, lineitem where
o_orderdate between $DATE1 and $DATE2 and
l_orderkey=o_orderkey
#CASSANDRA13
DML command mapping
*No SQL semantics
- INSERT overwrites rows
- UPDATE reads, then writes (have you updated what you read?)
- DELETE reads, then writes (can’t be sure if/what you’ve deleted)
*CassandraSE doesn’t make it SQL!
#CASSANDRA13
CassandraSE use cases
*Collect massive amounts of data like web page hits
*Collect massive amounts of data from sensors
*Updates are non-conflicting
- keyed by UUIDs, timestamps
*Reads are served with one lookup
*Good for certain kinds of data (though moving from SQL entirely may be
difficult)
#CASSANDRA13
Access Cassandra data from SQL
*Send an update to Cassandra
- be a sensor
*Get a piece of data from Cassandra
- This webpage was last viewed by...
- Last known position of this user was...
- You are user number n of n-thousands...
#CASSANDRA13
From MariaDB...
*Want a table that is:
- auto-replicated
- fault-tolerant
- very fast
*Get Cassandra and create a CassandraSE table
#CASSANDRA13
A possibly unique use
*MariaDB ships the CONNECT storage engine (XML, ODBC, etc.)
*You can CONNECT to Oracle (via ODBC), join results from Cassandra
(via CassandraSE) and have all your results sit in InnoDB
- yes, collaboration between Oracle, Cassandra and MariaDB is
possible today
*Remember to turn on engine condition pushdown
#CASSANDRA13
CassandraSE non-use cases
*Huge, sift through all data joins?
- use Pig
*Bulk data transfer to/from Cassandra Cluster?
- use Sqoop
*A replacement for InnoDB?
- remember no full SQL semantics, InnoDB is useful for myriad
reasons
#CASSANDRA13
A tiny benchmark
*One table
*Amazon EC2 environment
- m1.large nodes
- ephemeral disks
*Stream of single-line INSERTs
*Tried InnoDB & CassandraSE
*No tuning
#CASSANDRA13
A tiny benchmark II
*InnoDB with tuning, same
setup as before
#CASSANDRA13
Conclusions
*CassandraSE can be used to peek at data in Cassandra from MariaDB
*It is not a replacement for Pig/Hive
*It is really easy to setup & use
#CASSANDRA13
Roadmap
*Do you want support for:
- fast counter column updates?
- awareness/discovery of Cassandra cluster topology?
- secondary indexes?
- ... ?
#CASSANDRA13
Resources
*https://kb.askmonty.org/en/cassandrase/
*http://wiki.apache.org/cassandra/DataModel
*http://cassandra.apache.org/
*http://www.datastax.com/docs/1.1/ddl/column_family
*MariaDB: http://mariadb.org/
*Knowledge Base: http://kb.askmonty.org/
#CASSANDRA13
THANK YOU
Colin Charles | colin@mariadb.org | SkySQL Ab | http://mariadb.org/
@bytebot on Twitter | http://bytebot.net/blog/
#CASSANDRA13
Cassandra SE internals
*Developed against Cassandra 1.1
*Uses Thrift API
- cannot stream CQL resultset in 1.1
- cannot use secondary indexes
*Only supports AllowAllAuthenticator (Cassandra 1.2 has username/password authentication)
*In Cassandra 1.2
- “CQL Binary Protocol” with streaming
- CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE”
#CASSANDRA13
Running this on localhost
*Use vagrant, Ubuntu (12.04), DataStax Cassandra (1.1)
*http://julien.duponchelle.info/Cassandra-MariaDB-Virtual-Box.html
*Its nice to be able to run this locally, but beyond testing, there’s nothing
great from this
#CASSANDRA13
Really running this (on EC2)
*Use http://www.datastax.com/docs/1.2/install/install_ami
*minimum is m1.large instance
*--clustername MyCluster --totalnodes 1 --version community

More Related Content

What's hot (20)

PDF
The world's next top data model
Patrick McFadin
 
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PDF
Cassandra 3.0 advanced preview
Patrick McFadin
 
PDF
MariaDB for developers
Colin Charles
 
PDF
Cassandra 3.0
Robert Stupp
 
PDF
Introduction to data modeling with apache cassandra
Patrick McFadin
 
PDF
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
PDF
Become a super modeler
Patrick McFadin
 
PDF
Apache Cassandra at Macys
DataStax Academy
 
PDF
Bulk Loading Data into Cassandra
DataStax
 
PDF
Cassandra Materialized Views
Carl Yeksigian
 
PDF
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
PDF
How to Avoid Pitfalls in Schema Upgrade with Galera
Sveta Smirnova
 
PDF
Cassandra EU - Data model on fire
Patrick McFadin
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
PPTX
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
PDF
Time series with apache cassandra strata
Patrick McFadin
 
PDF
Introduction to CQL and Data Modeling with Apache Cassandra
Johnny Miller
 
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
The world's next top data model
Patrick McFadin
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Cassandra 3.0 advanced preview
Patrick McFadin
 
MariaDB for developers
Colin Charles
 
Cassandra 3.0
Robert Stupp
 
Introduction to data modeling with apache cassandra
Patrick McFadin
 
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
Become a super modeler
Patrick McFadin
 
Apache Cassandra at Macys
DataStax Academy
 
Bulk Loading Data into Cassandra
DataStax
 
Cassandra Materialized Views
Carl Yeksigian
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
How to Avoid Pitfalls in Schema Upgrade with Galera
Sveta Smirnova
 
Cassandra EU - Data model on fire
Patrick McFadin
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
Time series with apache cassandra strata
Patrick McFadin
 
Introduction to CQL and Data Modeling with Apache Cassandra
Johnny Miller
 
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 

Similar to MariaDB and Cassandra Interoperability (20)

PDF
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
DataStax Academy
 
PDF
Mysqlconf2013 mariadb-cassandra-interoperability
Sergey Petrunya
 
PDF
Maria db cassandra interoperability cassandra storage engine in mariadb
YUCHENG HU
 
KEY
Cassandra and Rails at LA NoSQL Meetup
Michael Wynholds
 
PDF
MariaDB for Developers and Operators (DevOps)
Colin Charles
 
PPTX
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
PPT
Cassandra - A Distributed Database System
Md. Shohel Rana
 
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
PDF
Transparent sharding with Spider: what's new and getting started
MariaDB plc
 
PDF
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
PDF
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PDF
[B14] A MySQL Replacement by Colin Charles
Insight Technology, Inc.
 
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
PDF
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PPTX
Cassandra Tutorial | Data types | Why Cassandra for Big Data
vinayiqbusiness
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PPT
Cassandra4Hadoop
DataStax Academy
 
PPT
Cassandra4hadoop
Edward Capriolo
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
DataStax Academy
 
Mysqlconf2013 mariadb-cassandra-interoperability
Sergey Petrunya
 
Maria db cassandra interoperability cassandra storage engine in mariadb
YUCHENG HU
 
Cassandra and Rails at LA NoSQL Meetup
Michael Wynholds
 
MariaDB for Developers and Operators (DevOps)
Colin Charles
 
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
Cassandra - A Distributed Database System
Md. Shohel Rana
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
Transparent sharding with Spider: what's new and getting started
MariaDB plc
 
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
[B14] A MySQL Replacement by Colin Charles
Insight Technology, Inc.
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra Tutorial | Data types | Why Cassandra for Big Data
vinayiqbusiness
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Cassandra4Hadoop
DataStax Academy
 
Cassandra4hadoop
Edward Capriolo
 
Ad

More from Colin Charles (20)

PDF
Differences between MariaDB 10.3 & MySQL 8.0
Colin Charles
 
PDF
What is MariaDB Server 10.3?
Colin Charles
 
PDF
Databases in the hosted cloud
Colin Charles
 
PDF
MySQL features missing in MariaDB Server
Colin Charles
 
PDF
The MySQL ecosystem - understanding it, not running away from it!
Colin Charles
 
PDF
Databases in the Hosted Cloud
Colin Charles
 
PDF
Best practices for MySQL High Availability Tutorial
Colin Charles
 
PDF
Percona ServerをMySQL 5.6と5.7用に作るエンジニアリング(そしてMongoDBのヒント)
Colin Charles
 
PDF
Capacity planning for your data stores
Colin Charles
 
PDF
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Colin Charles
 
PDF
Lessons from {distributed,remote,virtual} communities and companies
Colin Charles
 
PDF
Forking Successfully - or is a branch better?
Colin Charles
 
PDF
MariaDB Server Compatibility with MySQL
Colin Charles
 
PDF
Securing your MySQL / MariaDB Server data
Colin Charles
 
PDF
The MySQL Server Ecosystem in 2016
Colin Charles
 
PDF
The Complete MariaDB Server tutorial
Colin Charles
 
PDF
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
PDF
Lessons from database failures
Colin Charles
 
PDF
Lessons from database failures
Colin Charles
 
PDF
Lessons from database failures
Colin Charles
 
Differences between MariaDB 10.3 & MySQL 8.0
Colin Charles
 
What is MariaDB Server 10.3?
Colin Charles
 
Databases in the hosted cloud
Colin Charles
 
MySQL features missing in MariaDB Server
Colin Charles
 
The MySQL ecosystem - understanding it, not running away from it!
Colin Charles
 
Databases in the Hosted Cloud
Colin Charles
 
Best practices for MySQL High Availability Tutorial
Colin Charles
 
Percona ServerをMySQL 5.6と5.7用に作るエンジニアリング(そしてMongoDBのヒント)
Colin Charles
 
Capacity planning for your data stores
Colin Charles
 
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Colin Charles
 
Lessons from {distributed,remote,virtual} communities and companies
Colin Charles
 
Forking Successfully - or is a branch better?
Colin Charles
 
MariaDB Server Compatibility with MySQL
Colin Charles
 
Securing your MySQL / MariaDB Server data
Colin Charles
 
The MySQL Server Ecosystem in 2016
Colin Charles
 
The Complete MariaDB Server tutorial
Colin Charles
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
Lessons from database failures
Colin Charles
 
Lessons from database failures
Colin Charles
 
Lessons from database failures
Colin Charles
 
Ad

Recently uploaded (20)

PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

MariaDB and Cassandra Interoperability

  • 1. #CASSANDRA13 Colin Charles | [email protected] | SkySQL Ab | http://mariadb.org/ @bytebot on Twitter | http://bytebot.net/blog/ MariaDB and Cassandra Interoperability
  • 2. #CASSANDRA13 whoami *Work on MariaDB today *Formerly of MySQL AB (acquired Sun Microsystems) *Worked on The Fedora Project & OpenOffice.org previously *Monty Program Ab is a major sponsor of MariaDB *SkySQL & Monty Program Ab merge *MariaDB governed by MariaDB Foundation
  • 3. #CASSANDRA13 What we will discuss today... *What is MariaDB? *MariaDB Architecture *The Cassandra Storage Engine (CassandraSE) *Data & Command Mapping *Use Cases *Benchmarks *Conclusions
  • 4. #CASSANDRA13 What is MariaDB? *Community developed, feature enhanced, backward compatible MySQL *Drop-in replacement to MySQL *Shipped in many Linux distributions as a default *Enhanced features: threadpool, table elimination, optimizer changes (subqueries materialize!), group commit in the replication binary log, HandlerSocket, SphinxSE, multi-source replication, dynamic columns
  • 6. #CASSANDRA13 MariaDB/MySQL and NoSQL *HandlerSocket *memcached access to InnoDB *Hadoop Applier *LevelDB Storage Engine *Cassandra Storage Engine
  • 7. #CASSANDRA13 Dynamic Columns *Store a different set of columns for every row in the table *Basically a blob with handling functions (GET, CREATE, ADD, DELETE, EXISTS, LIST, JSON) *Dynamic columns can be nested *You can request rows in JSON format *You can now name dynamic columns as well INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name", "value");
  • 8. #CASSANDRA13 Cassandra background *Distributed key/value store (limited range scan support), optionally flexible schema (pre-defined “static” columns, ad-hoc dynamic columns), automatic sharding/ replication, eventual consistency *Column families are like “tables” *Row key -> column mapping *Supercolumns are not supported
  • 9. #CASSANDRA13 CQL at work cqlsh> CREATE KEYSPACE mariadbtest ... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1}; cqlsh> use mariadbtest; cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, data1 varchar, data2 bigint ) with compactstorage; cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) values ('row1', 'data-in-cassandra', 1234); cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 cqlsh:mariadbtest> select * from cf1 where pk='row1'; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 cqlsh:mariadbtest> select * from cf1 where data2=1234; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2'; Bad Request: line 1:34 missing EOF at 'or'
  • 10. #CASSANDRA13 CQL *Looks like SQL at first glance *No joins or subqueries *No GROUP BY, ORDER BY must be able to use available indexes *WHERE clause must represent an index lookup *Simple goal of the Cassandra Storage Engine? Provide a “view” of Cassandra’s data from MariaDB
  • 11. #CASSANDRA13 Getting started *Get MariaDB 10.0.3 from https://downloads.mariadb.org/ *Load the Cassandra plugin - From SQL: MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so'; - Or start it from my.cnf [mysqld] ... plugin-load=ha_cassandra.so
  • 12. #CASSANDRA13 Is everything ok? *Check to see that it is loaded - SHOW PLUGINS MariaDB [(none)]> show plugins; +--------------------+--------+-----------------+-----------------+---------+ | Name | Status | Type | Library | License | +--------------------+--------+-----------------+-----------------+---------+ ... | CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL | +--------------------+--------+-----------------+-----------------+---------+
  • 13. #CASSANDRA13 Create an SQL table which is a view of a column family MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113'; MariaDB [test]> create table t2 (pk varchar(36) primary key, -> data1 varchar(60), -> data2 bigint -> ) engine=cassandra -> keyspace='mariadbtest' -> thrift_host='10.196.2.113' -> column_family='cf1'; *thrift_host can be set per-table *@@cassandra_default_thrift_host allows to re-point the table to different node dynamically, and not change table DDL when Cassandra IP changes
  • 14. #CASSANDRA13 Potential issues *SELinux/AuditD blocks the connection ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1] *Disable SELinux: echo 0 > /selinux/enforce | service auditd stop *Cassandra 1.2 with Column Families (CFs) without “COMPACT STORAGE” attribute (pre-CQL3) ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in keyspace mariadbtest *Thrift based-clients no longer work, broke Pig as well (https:// issues.apache.org/jira/browse/CASSANDRA-5234); we’ll update this soon
  • 15. #CASSANDRA13 Accessing Cassandra data from MariaDB *Get data from Cassandra MariaDB [test]> select * from t2; +------+-------------------+-------+ | pk | data1 | data2 | +------+-------------------+-------+ | row1 | data-in-cassandra | 1234 | +------+-------------------+-------+ *Insert data into Cassandra MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123); *Ensure Cassandra sees inserted data cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 row2 | data-from-mariadb | 123
  • 16. #CASSANDRA13 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' *MariaDB table represents Cassandra’s Column Family - can use any table name, column_family=... specifies CF
  • 17. #CASSANDRA13 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' *MariaDB table represents Cassandra’s Column Family - can use any table name, column_family=... specifies CF *Table must have a primary key - name/type must match Cassandra’s rowkey
  • 18. #CASSANDRA13 Data mapping between Cassandra and SQL create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' *MariaDB table represents Cassandra’s Column Family - can use any table name, column_family=... specifies CF *Table must have a primary key - name/type must match Cassandra’s rowkey *Columns map to Cassandra’s static columns - name must be same as in Cassandra, datatypes must match, can be subset of CF’s columns
  • 19. #CASSANDRA13 Datatype mapping Cassandra MariaDB blob BLOB, VARBINARY(n) ascii BLOB, VARCHAR(n), use charset=latin1 text BLOB, VARCHAR(n), use charset=utf8 varint VARBINARY(n) int INT bigint BIGINT, TINY, SHORT uuid CHAR(36) (text in MariaDB) timestamp TIMESTAMP (second), TIMESTAMP(6) (microsecond), BIGINT boolean BOOL float FLOAT double DOUBLE decimal VARBINARY(n) counter BIGINT
  • 20. #CASSANDRA13 Dynamic columns revisited *Cassandra supports “dynamic column families”, can access ad-hoc columns create table tbl ( rowkey type PRIMARY KEY column1 type, ... dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes ) engine=cassandra keyspace=... column_family=...; insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2')); select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
  • 21. #CASSANDRA13 All data mapping is safe *CassandraSE will refuse incorrect mappings (throw errors) create table t3 (pk varchar(60) primary key, no_such_field int) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra' create table t3 (pk varchar(60) primary key, data1 double) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type'
  • 22. #CASSANDRA13 Command Mapping *Cassandra commands - PUT (upsert) - GET (performs a scan) - DELETE (if exists) *SQL commands - SELECT -> GET/Scan - INSERT -> PUT (upsert) - UPDATE/DELETE -> read/write
  • 23. #CASSANDRA13 SELECT command mapping *MariaDB has a SQL interpreter *CassandraSE supports lookups and scans *Can now do: - arbitrary WHERE clauses - JOINs between Cassandra tables and MariaDB tables (BKA supported)
  • 24. #CASSANDRA13 Batched Key Access is fast! select max(l_extendedprice) from orders, lineitem where o_orderdate between $DATE1 and $DATE2 and l_orderkey=o_orderkey
  • 25. #CASSANDRA13 DML command mapping *No SQL semantics - INSERT overwrites rows - UPDATE reads, then writes (have you updated what you read?) - DELETE reads, then writes (can’t be sure if/what you’ve deleted) *CassandraSE doesn’t make it SQL!
  • 26. #CASSANDRA13 CassandraSE use cases *Collect massive amounts of data like web page hits *Collect massive amounts of data from sensors *Updates are non-conflicting - keyed by UUIDs, timestamps *Reads are served with one lookup *Good for certain kinds of data (though moving from SQL entirely may be difficult)
  • 27. #CASSANDRA13 Access Cassandra data from SQL *Send an update to Cassandra - be a sensor *Get a piece of data from Cassandra - This webpage was last viewed by... - Last known position of this user was... - You are user number n of n-thousands...
  • 28. #CASSANDRA13 From MariaDB... *Want a table that is: - auto-replicated - fault-tolerant - very fast *Get Cassandra and create a CassandraSE table
  • 29. #CASSANDRA13 A possibly unique use *MariaDB ships the CONNECT storage engine (XML, ODBC, etc.) *You can CONNECT to Oracle (via ODBC), join results from Cassandra (via CassandraSE) and have all your results sit in InnoDB - yes, collaboration between Oracle, Cassandra and MariaDB is possible today *Remember to turn on engine condition pushdown
  • 30. #CASSANDRA13 CassandraSE non-use cases *Huge, sift through all data joins? - use Pig *Bulk data transfer to/from Cassandra Cluster? - use Sqoop *A replacement for InnoDB? - remember no full SQL semantics, InnoDB is useful for myriad reasons
  • 31. #CASSANDRA13 A tiny benchmark *One table *Amazon EC2 environment - m1.large nodes - ephemeral disks *Stream of single-line INSERTs *Tried InnoDB & CassandraSE *No tuning
  • 32. #CASSANDRA13 A tiny benchmark II *InnoDB with tuning, same setup as before
  • 33. #CASSANDRA13 Conclusions *CassandraSE can be used to peek at data in Cassandra from MariaDB *It is not a replacement for Pig/Hive *It is really easy to setup & use
  • 34. #CASSANDRA13 Roadmap *Do you want support for: - fast counter column updates? - awareness/discovery of Cassandra cluster topology? - secondary indexes? - ... ?
  • 36. #CASSANDRA13 THANK YOU Colin Charles | [email protected] | SkySQL Ab | http://mariadb.org/ @bytebot on Twitter | http://bytebot.net/blog/
  • 37. #CASSANDRA13 Cassandra SE internals *Developed against Cassandra 1.1 *Uses Thrift API - cannot stream CQL resultset in 1.1 - cannot use secondary indexes *Only supports AllowAllAuthenticator (Cassandra 1.2 has username/password authentication) *In Cassandra 1.2 - “CQL Binary Protocol” with streaming - CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE”
  • 38. #CASSANDRA13 Running this on localhost *Use vagrant, Ubuntu (12.04), DataStax Cassandra (1.1) *http://julien.duponchelle.info/Cassandra-MariaDB-Virtual-Box.html *Its nice to be able to run this locally, but beyond testing, there’s nothing great from this
  • 39. #CASSANDRA13 Really running this (on EC2) *Use http://www.datastax.com/docs/1.2/install/install_ami *minimum is m1.large instance *--clustername MyCluster --totalnodes 1 --version community