SlideShare a Scribd company logo
Rimas Silkaitis
From Postgres to Cassandra
NoSQL vs SQL
||
&&
Rimas Silkaitis
Product
@neovintage
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
app cloud
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DEPLOY MANAGE SCALE
$ git push heroku master
Counting objects: 11, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 22.29 KiB | 0 bytes/s, done.
Total 11 (delta 1), reused 0 (delta 0)
remote: Compressing source files... done.
remote: Building source:
remote:
remote: -----> Ruby app detected
remote: -----> Compiling Ruby
remote: -----> Using Ruby version: ruby-2.3.1
Heroku Postgres
Over 1 Million Active DBs
Heroku Redis
Over 100K Active Instances
Apache Kafka on Heroku
Runtime
Runtime
Workers
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Ugh
 Database Problems
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
Site Traffic
Events
* Totally Not to Scale
One
Big Table
Problem
CREATE TABLE users (
id bigserial,
account_id bigint,
name text,
email text,
encrypted_password text,
created_at timestamptz,
updated_at timestamptz
);
CREATE TABLE accounts (
id bigserial,
name text,
owner_id bigint,
created_at timestamptz,
updated_at timestamptz
);
CREATE TABLE events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamptz,
category text,
action text,
label text,
attributes jsonb
);
Table
events
events
events_20160901
events_20160902
events_20160903
events_20160904
Add Some Triggers
$ psql
neovintage::DB=> e
INSERT INTO events (
user_id,
account_id,
category,
action,
created_at)
VALUES (1,
2,
“in_app”,
“purchase_upgrade”
“2016-09-07 11:00:00 -07:00”);
events_20160901
events_20160902
events_20160903
events_20160904
events
INSERT
query
Constraints
‱ Data has little value after a period of time
‱ Small range of data has to be queried
‱ Old data can be archived or aggregated
There’s A Better Way
&&
One
Big Table
Problem
$ psql
psql => d
List of relations
schema | name | type | owner
--------+----------+-------+-----------
public | users | table | neovintage
public | accounts | table | neovintage
public | events | table | neovintage
public | tasks | table | neovintage
public | lists | table | neovintage
Why Introduce
Cassandra?
‱ Linear Scalability
‱ No Single Point of Failure
‱ Flexible Data Model
‱ Tunable Consistency
Runtime
WorkersNew Architecture
I only know relational databases.
How do I do this?
Understanding Cassandra
Two Dimensional
Table Spaces
RELATIONAL
Associative Arrays
or Hash
KEY-VALUE
Postgres is Typically Run as Single Instance*
‱ Partitioned Key-Value Store
‱ Has a Grouping of Nodes (data
center)
‱ Data is distributed amongst the
nodes
Cassandra Cluster with 2 Data Centers
assandra uery anguage
SQL-like
[sēkwel lahyk]
adjective
Resembling SQL in appearance,
behavior or character
adverb
In the manner of SQL
s Talk About Primary K
Partition
Table
Partition Key
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
‱ 5 Node Cluster
‱ Simplest terms: Data is partitioned
amongst all the nodes using the
hashing function.
Replication Factor
Replication Factor
Setting this parameter
tells Cassandra how
many nodes to copy
incoming the data to
This is a replication factor of 3
But I thought
Cassandra had
tables?
Prior to 3.0, tables were called column families
Let’s Model Our Events
Table in Cassandra
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
We’re not going to go
through any setup
Plenty of tutorials exist
for that sort of thing
Let’s assume were
working with 5 node
cluster
$ psql
neovintage::DB=> d events
Table “public.events"
Column | Type | Modifiers
---------------+--------------------------+-----------
user_id | bigint |
account_id | bigint |
session_id | text |
occurred_at | timestamp with time zone |
category | text |
action | text |
label | text |
attributes | jsonb |
$ cqlsh
cqlsh> CREATE KEYSPACE
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
KEYSPACE ==
SCHEMA
‱ CQL can use KEYSPACE and SCHEMA
interchangeably
‱ SCHEMA in Cassandra is somewhere between
`CREATE DATABASE` and `CREATE SCHEMA` in
Postgres
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
Replication Strategy
$ cqlsh
cqlsh> CREATE SCHEMA
IF NOT EXISTS neovintage_prod
WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘us-east’: 3
};
Replication Factor
Replication Strategies
‱ NetworkTopologyStrategy - You have to define the
network topology by defining the data centers. No
magic here
‱ SimpleStrategy - Has no idea of the topology and
doesn’t care to. Data is replicated to adjacent nodes.
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint primary key,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>
);
Remember the Primary
Key?
‱ Postgres defines a PRIMARY KEY as a constraint
that a column or group of columns can be used as a
unique identifier for rows in the table.
‱ CQL shares that same constraint but extends the
definition even further. Although the main purpose is
to order information in the cluster.
‱ CQL includes partitioning and sort order of the data
on disk (clustering).
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint primary key,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>
);
Single Column Primary
Key
‱ Used for both partitioning and clustering.
‱ Syntactically, can be defined inline or as a separate
line within the DDL statement.
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
Composite
Partition Key
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
);
Clustering Keys
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
Composite Partition Key
‱ This means that both the user_id and the occurred_at
columns are going to be used to partition data.
‱ If you were to not include the inner parenthesis, the the
first column listed in this PRIMARY KEY definition
would be the sole partition key.
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
Clustering Columns
‱ Defines how the data is sorted on disk. In this case, its
by account_id and then session_id
‱ It is possible to change the direction of the sort order
$ cqlsh
cqlsh> CREATE TABLE neovintage_prod.events (
user_id bigint,
account_id bigint,
session_id text,
occurred_at timestamp,
category text,
action text,
label text,
attributes map<text, text>,
PRIMARY KEY (
(user_id, occurred_at),
account_id,
session_id
)
) WITH CLUSTERING ORDER BY (
account_id desc, session_id acc
);
Ahhhhh
 Just
like SQL
Data TypesTypes
Postgres Type Cassandra Type
bigint bigint
int int
decimal decimal
float float
text text
varchar(n) varchar
blob blob
json N/A
jsonb N/A
hstore map<type>, <type>
Postgres Type Cassandra Type
bigint bigint
int int
decimal decimal
float float
text text
varchar(n) varchar
blob blob
json N/A
jsonb N/A
hstore map<type>, <type>
Challenges
‱ JSON / JSONB columns don't have 1:1 mappings in
Cassandra
‱ You’ll need to nest MAP type in Cassandra or flatten
out your JSON
‱ Be careful about timestamps!! Time zones are already
challenging in Postgres.
‱ If you don’t specify a time zone in Cassandra the time
zone of the coordinator node is used. Always specify
one.
Ready for
Webscale
General Tips
‱ Just like Table Partitioning in Postgres, you need to
think about how you’re going to query the data in
Cassandra. This dictates how you set up your keys.
‱ We just walked through the semantics on the
database side. Tackling this change on the
application-side is a whole extra topic.
‱ This is just enough information to get you started.
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Runtime
Workers
Runtime
Workers
Foreign Data Wrapper
fdw
=>
fdw
We’re not going to go through
any setup, again

..
https://bitbucket.org/openscg/cassandra_fdw
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
neovintage::DB=> CREATE USER MAPPING FOR public
SERVER cass_serv
OPTIONS (username 'test', password ‘test');
CREATE USER
$ psql
neovintage::DB=> CREATE EXTENSION cassandra_fdw;
CREATE EXTENSION
neovintage::DB=> CREATE SERVER cass_serv
FOREIGN DATA WRAPPER cassandra_fdw
OPTIONS (host ‘127.0.0.1');
CREATE SERVER
neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv
OPTIONS (username 'test', password ‘test');
CREATE USER
neovintage::DB=> CREATE FOREIGN TABLE cass.events (id int)
SERVER cass_serv
OPTIONS (schema_name ‘neovintage_prod',
table_name 'events', primary_key ‘id');
CREATE FOREIGN TABLE
neovintage::DB=> INSERT INTO cass.events (
user_id,
occurred_at,
label
)
VALUES (
1234,
“2016-09-08 11:00:00 -0700”,
“awesome”
);
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Some Gotchas
‱ No Composite Primary Key Support in
cassandra_fdw
‱ No support for UPSERT
‱ Postgres 9.5+ and Cassandra 3.0+ Supported
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016

More Related Content

What's hot (20)

PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
PPTX
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
PPTX
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
PPTX
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax
 
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
PDF
Apache Cassandra at Macys
DataStax Academy
 
PDF
Introduction to data modeling with apache cassandra
Patrick McFadin
 
PPTX
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PPTX
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
DataStax
 
PPTX
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
PPTX
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
PDF
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
PPTX
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
PDF
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
DataStax
 
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
PDF
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
DataStax
 
PDF
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
PPTX
Everyday I’m scaling... Cassandra
Instaclustr
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
Apache Cassandra at Macys
DataStax Academy
 
Introduction to data modeling with apache cassandra
Patrick McFadin
 
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
DataStax
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
DataStax
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
DataStax
 
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
Everyday I’m scaling... Cassandra
Instaclustr
 

Viewers also liked (20)

PDF
EDF2013: Selected Talk, Simon Riggs: Practical PostgreSQL and AXLE Project
European Data Forum
 
PDF
Cassandra db
Henrique Dias
 
PDF
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
PDF
Music Recommendations at Spotify
Emily Samuels
 
PDF
EXPLicando o Explain no PostgreSQL
FabrĂ­zio Mello
 
PDF
PGDay Campinas 2013 - PL/pg
ETL – Transformação de dados para DW e BI usando ...
PGDay Campinas
 
PDF
PGDay Campinas 2013 - Como Full Text Search pode ajudar na busca textual
PGDay Campinas
 
PDF
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Formant
 
PDF
PostgreSQL: How to Store Passwords Safely
Juliano Atanazio
 
PPTX
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
PDF
Dba PostgreSQL desde bĂĄsico a avanzado parte2
EQ SOFT EIRL
 
PDF
Building an Activity Feed with Cassandra
Mark Dunphy
 
PDF
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
DataStax
 
PDF
PgBouncer: Pool, Segurança e Disaster Recovery | Felipe Pereira
PGDay Campinas
 
PDF
DevOps e PostgreSQL: Replicação de forma simplificada | Miguel Di Ciurcio
PGDay Campinas
 
PPSX
Testing - Ing. Gabriela Muñoz
Mario Osvaldo Bressano
 
PDF
Cassandra By Example: Data Modelling with CQL3
Eric Evans
 
PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Athiq Ahamed
 
PDF
Cassandra NoSQL Tutorial
Michelle Darling
 
PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
EDF2013: Selected Talk, Simon Riggs: Practical PostgreSQL and AXLE Project
European Data Forum
 
Cassandra db
Henrique Dias
 
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
Music Recommendations at Spotify
Emily Samuels
 
EXPLicando o Explain no PostgreSQL
FabrĂ­zio Mello
 
PGDay Campinas 2013 - PL/pg
ETL – Transformação de dados para DW e BI usando ...
PGDay Campinas
 
PGDay Campinas 2013 - Como Full Text Search pode ajudar na busca textual
PGDay Campinas
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Formant
 
PostgreSQL: How to Store Passwords Safely
Juliano Atanazio
 
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Dba PostgreSQL desde bĂĄsico a avanzado parte2
EQ SOFT EIRL
 
Building an Activity Feed with Cassandra
Mark Dunphy
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
DataStax
 
PgBouncer: Pool, Segurança e Disaster Recovery | Felipe Pereira
PGDay Campinas
 
DevOps e PostgreSQL: Replicação de forma simplificada | Miguel Di Ciurcio
PGDay Campinas
 
Testing - Ing. Gabriela Muñoz
Mario Osvaldo Bressano
 
Cassandra By Example: Data Modelling with CQL3
Eric Evans
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Athiq Ahamed
 
Cassandra NoSQL Tutorial
Michelle Darling
 
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Ad

Similar to From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016 (20)

PDF
Introduction to Cassandra
Artur Mkrtchyan
 
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
PDF
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
DataStax Academy
 
PDF
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
PDF
Using cassandra as a distributed logging to store pb data
Ramesh Veeramani
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PPTX
Presentation
Dimitris Stripelis
 
PDF
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
PDF
Cassandra and Spark
nickmbailey
 
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
PDF
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
PDF
Intro to Cassandra
Jon Haddad
 
PDF
Introduction to cassandra 2014
Patrick McFadin
 
PDF
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Jon Haddad
 
ODP
Cassandra Data Modelling
Knoldus Inc.
 
PDF
PostgreSQL, your NoSQL database
Reuven Lerner
 
PPTX
Using Cassandra with your Web Application
supertom
 
PDF
Crash course intro to cassandra
Jon Haddad
 
PDF
Introduction to Cassandra
DataStax Academy
 
PPTX
Introduction to NoSQL CassandraDB
Janos Geronimo
 
Introduction to Cassandra
Artur Mkrtchyan
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Using cassandra as a distributed logging to store pb data
Ramesh Veeramani
 
Learning Cassandra NoSQL
Pankaj Khattar
 
Presentation
Dimitris Stripelis
 
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
Cassandra and Spark
nickmbailey
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
Intro to Cassandra
Jon Haddad
 
Introduction to cassandra 2014
Patrick McFadin
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Jon Haddad
 
Cassandra Data Modelling
Knoldus Inc.
 
PostgreSQL, your NoSQL database
Reuven Lerner
 
Using Cassandra with your Web Application
supertom
 
Crash course intro to cassandra
Jon Haddad
 
Introduction to Cassandra
DataStax Academy
 
Introduction to NoSQL CassandraDB
Janos Geronimo
 
Ad

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
PPTX
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
DataStax
 
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
PDF
Introduction to Apache Cassandraℱ + What’s New in 4.0
DataStax
 
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
PDF
Designing a Distributed Cloud Database for Dummies
DataStax
 
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
PDF
How to Evaluate Cloud Databases for eCommerce
DataStax
 
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
PPTX
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandraℱ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 

Recently uploaded (20)

PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Tally software_Introduction_Presentation
AditiBansal54083
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016

  • 3. ||
  • 4. &&
  • 10. $ git push heroku master Counting objects: 11, done. Delta compression using up to 8 threads. Compressing objects: 100% (10/10), done. Writing objects: 100% (11/11), 22.29 KiB | 0 bytes/s, done. Total 11 (delta 1), reused 0 (delta 0) remote: Compressing source files... done. remote: Building source: remote: remote: -----> Ruby app detected remote: -----> Compiling Ruby remote: -----> Using Ruby version: ruby-2.3.1
  • 11. Heroku Postgres Over 1 Million Active DBs
  • 12. Heroku Redis Over 100K Active Instances
  • 13. Apache Kafka on Heroku
  • 16. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 21. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 24. CREATE TABLE users ( id bigserial, account_id bigint, name text, email text, encrypted_password text, created_at timestamptz, updated_at timestamptz ); CREATE TABLE accounts ( id bigserial, name text, owner_id bigint, created_at timestamptz, updated_at timestamptz );
  • 25. CREATE TABLE events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamptz, category text, action text, label text, attributes jsonb );
  • 26. Table
  • 29. $ psql neovintage::DB=> e INSERT INTO events ( user_id, account_id, category, action, created_at) VALUES (1, 2, “in_app”, “purchase_upgrade” “2016-09-07 11:00:00 -07:00”);
  • 31. Constraints ‱ Data has little value after a period of time ‱ Small range of data has to be queried ‱ Old data can be archived or aggregated
  • 33. &&
  • 35. $ psql psql => d List of relations schema | name | type | owner --------+----------+-------+----------- public | users | table | neovintage public | accounts | table | neovintage public | events | table | neovintage public | tasks | table | neovintage public | lists | table | neovintage
  • 36. Why Introduce Cassandra? ‱ Linear Scalability ‱ No Single Point of Failure ‱ Flexible Data Model ‱ Tunable Consistency
  • 38. I only know relational databases. How do I do this?
  • 42. Postgres is Typically Run as Single Instance*
  • 43. ‱ Partitioned Key-Value Store ‱ Has a Grouping of Nodes (data center) ‱ Data is distributed amongst the nodes
  • 44. Cassandra Cluster with 2 Data Centers
  • 46. SQL-like [sēkwel lahyk] adjective Resembling SQL in appearance, behavior or character adverb In the manner of SQL
  • 47. s Talk About Primary K Partition
  • 48. Table
  • 51. ‱ 5 Node Cluster ‱ Simplest terms: Data is partitioned amongst all the nodes using the hashing function.
  • 53. Replication Factor Setting this parameter tells Cassandra how many nodes to copy incoming the data to This is a replication factor of 3
  • 54. But I thought Cassandra had tables?
  • 55. Prior to 3.0, tables were called column families
  • 56. Let’s Model Our Events Table in Cassandra
  • 58. We’re not going to go through any setup Plenty of tutorials exist for that sort of thing Let’s assume were working with 5 node cluster
  • 59. $ psql neovintage::DB=> d events Table “public.events" Column | Type | Modifiers ---------------+--------------------------+----------- user_id | bigint | account_id | bigint | session_id | text | occurred_at | timestamp with time zone | category | text | action | text | label | text | attributes | jsonb |
  • 60. $ cqlsh cqlsh> CREATE KEYSPACE IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 };
  • 61. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 };
  • 62. KEYSPACE == SCHEMA ‱ CQL can use KEYSPACE and SCHEMA interchangeably ‱ SCHEMA in Cassandra is somewhere between `CREATE DATABASE` and `CREATE SCHEMA` in Postgres
  • 63. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 }; Replication Strategy
  • 64. $ cqlsh cqlsh> CREATE SCHEMA IF NOT EXISTS neovintage_prod WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘us-east’: 3 }; Replication Factor
  • 65. Replication Strategies ‱ NetworkTopologyStrategy - You have to define the network topology by defining the data centers. No magic here ‱ SimpleStrategy - Has no idea of the topology and doesn’t care to. Data is replicated to adjacent nodes.
  • 66. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint primary key, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text> );
  • 67. Remember the Primary Key? ‱ Postgres defines a PRIMARY KEY as a constraint that a column or group of columns can be used as a unique identifier for rows in the table. ‱ CQL shares that same constraint but extends the definition even further. Although the main purpose is to order information in the cluster. ‱ CQL includes partitioning and sort order of the data on disk (clustering).
  • 68. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint primary key, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text> );
  • 69. Single Column Primary Key ‱ Used for both partitioning and clustering. ‱ Syntactically, can be defined inline or as a separate line within the DDL statement.
  • 70. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) );
  • 71. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ); Composite Partition Key
  • 72. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ); Clustering Keys
  • 73. PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) Composite Partition Key ‱ This means that both the user_id and the occurred_at columns are going to be used to partition data. ‱ If you were to not include the inner parenthesis, the the first column listed in this PRIMARY KEY definition would be the sole partition key.
  • 74. PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) Clustering Columns ‱ Defines how the data is sorted on disk. In this case, its by account_id and then session_id ‱ It is possible to change the direction of the sort order
  • 75. $ cqlsh cqlsh> CREATE TABLE neovintage_prod.events ( user_id bigint, account_id bigint, session_id text, occurred_at timestamp, category text, action text, label text, attributes map<text, text>, PRIMARY KEY ( (user_id, occurred_at), account_id, session_id ) ) WITH CLUSTERING ORDER BY ( account_id desc, session_id acc ); Ahhhhh
 Just like SQL
  • 77. Postgres Type Cassandra Type bigint bigint int int decimal decimal float float text text varchar(n) varchar blob blob json N/A jsonb N/A hstore map<type>, <type>
  • 78. Postgres Type Cassandra Type bigint bigint int int decimal decimal float float text text varchar(n) varchar blob blob json N/A jsonb N/A hstore map<type>, <type>
  • 79. Challenges ‱ JSON / JSONB columns don't have 1:1 mappings in Cassandra ‱ You’ll need to nest MAP type in Cassandra or flatten out your JSON ‱ Be careful about timestamps!! Time zones are already challenging in Postgres. ‱ If you don’t specify a time zone in Cassandra the time zone of the coordinator node is used. Always specify one.
  • 81. General Tips ‱ Just like Table Partitioning in Postgres, you need to think about how you’re going to query the data in Cassandra. This dictates how you set up your keys. ‱ We just walked through the semantics on the database side. Tackling this change on the application-side is a whole extra topic. ‱ This is just enough information to get you started.
  • 86. fdw
  • 87. We’re not going to go through any setup, again

.. https://bitbucket.org/openscg/cassandra_fdw
  • 88. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION
  • 89. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER
  • 90. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv OPTIONS (username 'test', password ‘test'); CREATE USER
  • 91. $ psql neovintage::DB=> CREATE EXTENSION cassandra_fdw; CREATE EXTENSION neovintage::DB=> CREATE SERVER cass_serv FOREIGN DATA WRAPPER cassandra_fdw OPTIONS (host ‘127.0.0.1'); CREATE SERVER neovintage::DB=> CREATE USER MAPPING FOR public SERVER cass_serv OPTIONS (username 'test', password ‘test'); CREATE USER neovintage::DB=> CREATE FOREIGN TABLE cass.events (id int) SERVER cass_serv OPTIONS (schema_name ‘neovintage_prod', table_name 'events', primary_key ‘id'); CREATE FOREIGN TABLE
  • 92. neovintage::DB=> INSERT INTO cass.events ( user_id, occurred_at, label ) VALUES ( 1234, “2016-09-08 11:00:00 -0700”, “awesome” );
  • 94. Some Gotchas ‱ No Composite Primary Key Support in cassandra_fdw ‱ No support for UPSERT ‱ Postgres 9.5+ and Cassandra 3.0+ Supported