SlideShare a Scribd company logo
The State of CQL
Sylvain Lebresne (@pcmanus)
June 12, 2013
Why CQL?
(Rational and goals behind CQL)
What is CQL?
(How do you model application with CQL)
The native protocol
(Transporting CQL queries)
What's next?
(Cassandra 2.0 and beyond)
2/26
Disclaimer
This presentation focuses exclusively on CQL version 3. Many things do not apply to CQL version 1 and 2.
Unless explicitly state otherwise, the terms rows and columns means CQL3 rows and CQL3 columns, which does
not map directly to the notion of rows and columns in thrift (or the internal C* implementation).
·
·
3/26
Why?
Rational and goals behind CQL
The thrift API is:
Cassandra has often been regarded as hard to develop against.
It doesn't have to be that way!
Not user friendly, hard to use.
Low level.
Very little abstraction.
Hard to evolve (in a backward compatible way).
·
·
·
·
5/26
Why the hell a SQL look-alike query language?!
So why not?
Very easy to read.
Programming Language independent.
Ubiquitous, widely known.
Copy/paste friendly.
Easy to evolve.
Does not imply slow.
Doesn't force you to work with string.
·
·
·
·
·
·
·
6/26
Hence, CQL
"Denormalized SQL"
Strictly real-time oriented
·
·
No joins
No sub-queries
No aggregation
Limited ORDER BY
-
-
-
-
7/26
CQL: the 'C' stands for Cassandra
Goals:
Not goals:
Provide a user friendly, productive API for C*.
Make it easy to do the right thing, hard to do the wrong one.
Provide higher level constructs for useful modeling patterns.
Be a complete alternative to the Thrift API.
·
·
·
·
Be SQL.
Abstract C* (useful) specificities away (distribution awareness, C* storage engine, ...).
Be slow.
·
·
·
8/26
What is CQL?
How do you model application with CQL
Cassandra modeling 101
Efficient queries in Cassandra boils down to:
And denormalization is the technique that allows to achieve this in practice.
But this imply the API should:
The Thrift API allows that. So does CQL.
1. Data Locality at the cluster level: a query should only hit one node.
2. Data Locality at the node level: C* storage engine allows data collocation on disk.
expose how to collocate data in the same replica set.
expose how to collocate data on disk (for a given replica).
to query data that is collocated.
·
·
·
10/26
A naive e-mailing application
We want to model:
Users
Emails
Users inboxes (all emails received by a user in chronological order)
·
·
·
11/26
Storing user profiles
CREATETABLEusers(
user_iduuid,
nametext,
passwordtext,
emailtext,
picture_profileblob,
PRIMARYKEY(user_id)
)
--ThisisreallyanUPSERT
INSERTINTOusers(user_id,name,password,email,picture_profile)
VALUES(51b-23-ab8,'SylvainLebresne','Hd3!ba','lebresne@gmail.com',0xf8ac...);
--ThistooisanUPSERT
UPDATEusersSETemail='sylvain@datastax.com',password='B9a1^'WHEREuser_id=51b-23-ab8;
CQL
The first component of the PRIMARY KEY is called the partition key.
All the data sharing the same partition key is stored on the same replica set.
·
·
12/26
Allowing user defined properties
Say we want the user to be able to add to this own profile a set of custom properties:
user_id email name password picture_profile user_props
51b-23-ab8 lebresne@gmail.com Sylvain Lebresne B9a1^ 0xf8ac... { 'myProperty' : 'Whatever I want' }
ALTERTABLEusersADDuser_propsmap<text,text>;
UPDATEusersSETuser_props['myProperty']='WhateverIwant'WHEREuser_id=51b-23-ab8;
SELECT*FROMusers;
CQL
13/26
Storing emails
Only “indexed” queried are allowed. You cannot do:
That is, unless you explicitely index from using:
CREATETABLEemails(
email_idtimeuuidPRIMARYKEY, --Embedstheemailcreationdate
subjecttext,
senderuuid,
recipientsset<uuid>,
bodytext
)
--Insertsemails...
CQL
SELECT*FROMemailsWHEREsender=51b-23-ab8; CQL
CREATEINDEXONemails(sender); CQL
14/26
Inboxes
For each user, it's inbox is the list of it's emails chronologically sorted.
To display the inbox, we need for each email the subject, the sender and recipients names and emails.
In a traditional RDBMS, we could join the users and emails table.
In Cassandra, we denormalize. That is, we store the pre-computed result of queries we care about (always up to
date materialized view).
·
·
·
Good luck to scale that!-
·
Collocate all the data for an inbox on the same node.
Collocate all inbox emails on disk, in the order queried.
This is typically the time-series kind of model for which Cassandra shines.
-
-
-
15/26
Storing inboxes
CQL distinguishes 2 sub-parts in the PRIMARY KEY:
In practice, we are interested by having emails stored in reverse chronological order.
CREATETABLEinboxes(
user_iduuid,
email_idtimeuuid,
sender_emailtext,
recipients_emailsset<text>,
subjecttext,
is_readboolean,
PRIMARYKEY(user_id, email_id)
)WITHCLUSTERINGORDERBY(email_idDESC)
CQL
partition key: decides the node on which the data is stored
clustering columns: within the same partition key, (CQL3) rows are physically ordered following the clustering
columns
·
·
16/26
Storing inboxes cont'd
In this example, this allows efficient queries of time range of emails for a given inbox.
email_id dateOf(email_id) sender_email recipients_emails subject
d20-32-012 2013-06-24 00:42+0000 Yuki Morishita <yuki@datastax.com> { 'Sylvain Lebresne' } あなたに幸せな誕生日 false
17a-bf-65f 2013-03-01 17:03+0000 Aleksey Yeschenko <aleksey@datastax.com> { 'Sylvain Lebresne' } RE: What do you think? true
a9c-13-9da 2013-02-10 04:12+0000 Brandon Williams <brandon@datastax.com> { 'Jonathan Ellis', 'Sylvain Lebresne' } dtests are broken!?@# true
241-b4-ca0 2013-01-04 12:45+0000 Jonathan Ellis <jbellis@datastax.com> { 'Sylvain Lebresne' } Whatzz up? true
--Getallemailsforuser51b-23-ab8sinceJan01,2013inreversechronologicalorder.
SELECTemail_id,dateOf(email_id),sender_email,recipients_emails,subject,is_read
FROMinboxes
WHEREuser_id=51b-23-ab8ANDemail_id>minTimeuuid('2013-01-0100:00+0000')
ORDERBYemail_idDESC;
CQL
17/26
Handling huge inboxes
What if inboxes can become too big? The traditional solution consists in sharding inboxes in adapted time shards
(say a year), to avoid storing it all on one node.
This can be easily done using a composite partition key:
CREATETABLEinboxes(
user_iduuid,
yearint,
email_idtimeuuid,
sender_emailtext,
recipients_namestext,
subjecttext,
PRIMARYKEY((user_id,year),email_id)
)WITHCLUSTERINGORDERBY(email_idDESC)
CQL
18/26
Upgrading from thrift
For more details on the relationship between thrift and CQL:
CQL uses the same internal storage engine than Thrift
CQL can read your existing Thrift column families (no data migration needed):
You can read CQL3 tables from thrift, but this is not easy in practice because some CQL3 metadata are not
exposed through thrift for compatibility reasons.
CQL is meant to be an alternative to Thrift, not a complement to it.
·
·
cqlsh>USE"<keyspace_name>";
cqlsh>DESCRIBE"<column_family_name>";
cqlsh>SELECT*FROM"<column_family_name>"LIMIT20;
CQL
·
·
http://www.datastax.com/dev/blog/thrift-to-cql3
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
·
·
19/26
The native protocol
Transporting CQL queries
The native protocol
A binary transport for CQL3:
Want to know more about drivers using this native protocol? Stay in the room for Michaël and Patrick's talk.
Asynchronous (allows multiple concurrent queries per connection)
Server notifications (Only for generic cluster events currently)
Made for CQL3
·
·
·
21/26
What's next?
Cassandra 2.0 and beyond
Cassandra 2.0: CQL3
Compare-and-swap support
Triggers
Allow preparation of TIMESTAMP, TTL and LIMIT.
Primary key columns 2ndary indexing
ALTER ... DROP
·
UPDATEloginSETpassword='fs3!c'WHEREusername='pcmanus'IFNOTEXISTS;
UPDATEusersSETemail='sylvain@datastax.com'WHEREuser_id=51b-23-ab8IFemail='slebresne@apache.org';
CQL
·
·
·
·
23/26
Cassandra 2.0: Native protocol
One-short prepare-and-execute message
Batching of prepared statement
SASL authentication
Automatic query paging
·
·
·
·
24/26
After C* 2.0
Continue to improve the user experience by facilitating good data modeling, while respecting Cassandra inherent
specificities.
Storage engine optimizations
Collections 2ndary indexing
Aggregations within a partition
User defined 'struct' types
...
·
·
·
·
·
25/26
Thank You!
(Questions?)

More Related Content

Viewers also liked (20)

PPTX
How News and Publishing Use Technology
logomachy
 
DOCX
ΡΑΔΙΟΦΩΝΟ
Eleni Kabaraki
 
PPTX
Manual prevenció de riscos laborals
Anna Bernardez Fanlo
 
ODP
Sortida cultural al palau nacional de barcelona
Laura Salvatierra
 
ODP
Estadocivil
Leticia Berriel
 
PDF
Cartel tc
documentosMH
 
ODP
Presentación final DAI
carlos diaz guijarro
 
PPTX
L3 methodology lesson
andypinks
 
PPTX
Presentación2
Carlis93
 
PPTX
Presentación2
Carlis93
 
PDF
Meeting room refurbishment ideas for the workplace
Ben Johnson Ltd
 
PPTX
Plan de gestión de riesgos
Wendy Navarro
 
PPT
Presentacion prueba
Angie Acosta
 
PPT
Deportes llll
BoniTha DaniziTha
 
PPTX
Tecnicas artìsticas
cavero55
 
PPTX
Presentación1
STIVEN QUILISMAL
 
PPS
Portal para-as-estrelas
jmpcard
 
PPTX
Presentación del curso geometría analítica para 5ºhumanistico
Walter Agustín
 
ODP
Trabajo final2
marenas
 
How News and Publishing Use Technology
logomachy
 
ΡΑΔΙΟΦΩΝΟ
Eleni Kabaraki
 
Manual prevenció de riscos laborals
Anna Bernardez Fanlo
 
Sortida cultural al palau nacional de barcelona
Laura Salvatierra
 
Estadocivil
Leticia Berriel
 
Cartel tc
documentosMH
 
Presentación final DAI
carlos diaz guijarro
 
L3 methodology lesson
andypinks
 
Presentación2
Carlis93
 
Presentación2
Carlis93
 
Meeting room refurbishment ideas for the workplace
Ben Johnson Ltd
 
Plan de gestión de riesgos
Wendy Navarro
 
Presentacion prueba
Angie Acosta
 
Deportes llll
BoniTha DaniziTha
 
Tecnicas artìsticas
cavero55
 
Presentación1
STIVEN QUILISMAL
 
Portal para-as-estrelas
jmpcard
 
Presentación del curso geometría analítica para 5ºhumanistico
Walter Agustín
 
Trabajo final2
marenas
 

Similar to C* Summit 2013: The State of CQL by Sylvain Lebresne (20)

PDF
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
PDF
Cassandra EU - State of CQL
pcmanus
 
PDF
C* Summit EU 2013: The State of CQL
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
PPTX
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
PDF
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
PPTX
An Introduction to Cassandra - Oracle User Group
Carlos Juzarte Rolo
 
PPTX
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
PDF
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
PPTX
Appache Cassandra
nehabsairam
 
PPTX
cassandra_presentation_final
SergioBruno21
 
PDF
Cassandra Data Modeling
Ben Knear
 
PDF
Deep dive into CQL
Rustam Aliyev
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PPTX
Cassandra
Pooja GV
 
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
PDF
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
DataStax Academy
 
PDF
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
Cassandra EU - State of CQL
pcmanus
 
C* Summit EU 2013: The State of CQL
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
An Introduction to Cassandra - Oracle User Group
Carlos Juzarte Rolo
 
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
Appache Cassandra
nehabsairam
 
cassandra_presentation_final
SergioBruno21
 
Cassandra Data Modeling
Ben Knear
 
Deep dive into CQL
Rustam Aliyev
 
Learning Cassandra NoSQL
Pankaj Khattar
 
Cassandra
Pooja GV
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
PPTX
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
PDF
Cassandra 3.0 Data Modeling
DataStax Academy
 
PPTX
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Coursera Cassandra Driver
DataStax Academy
 
PDF
Production Ready Cassandra
DataStax Academy
 
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
PDF
Standing Up Your First Cluster
DataStax Academy
 
PDF
Real Time Analytics with Dse
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Core Concepts
DataStax Academy
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PDF
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Advanced Cassandra
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
DataStax Academy
 
Ad

Recently uploaded (20)

PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
July Patch Tuesday
Ivanti
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 

C* Summit 2013: The State of CQL by Sylvain Lebresne

  • 1. The State of CQL Sylvain Lebresne (@pcmanus) June 12, 2013
  • 2. Why CQL? (Rational and goals behind CQL) What is CQL? (How do you model application with CQL) The native protocol (Transporting CQL queries) What's next? (Cassandra 2.0 and beyond) 2/26
  • 3. Disclaimer This presentation focuses exclusively on CQL version 3. Many things do not apply to CQL version 1 and 2. Unless explicitly state otherwise, the terms rows and columns means CQL3 rows and CQL3 columns, which does not map directly to the notion of rows and columns in thrift (or the internal C* implementation). · · 3/26
  • 5. The thrift API is: Cassandra has often been regarded as hard to develop against. It doesn't have to be that way! Not user friendly, hard to use. Low level. Very little abstraction. Hard to evolve (in a backward compatible way). · · · · 5/26
  • 6. Why the hell a SQL look-alike query language?! So why not? Very easy to read. Programming Language independent. Ubiquitous, widely known. Copy/paste friendly. Easy to evolve. Does not imply slow. Doesn't force you to work with string. · · · · · · · 6/26
  • 7. Hence, CQL "Denormalized SQL" Strictly real-time oriented · · No joins No sub-queries No aggregation Limited ORDER BY - - - - 7/26
  • 8. CQL: the 'C' stands for Cassandra Goals: Not goals: Provide a user friendly, productive API for C*. Make it easy to do the right thing, hard to do the wrong one. Provide higher level constructs for useful modeling patterns. Be a complete alternative to the Thrift API. · · · · Be SQL. Abstract C* (useful) specificities away (distribution awareness, C* storage engine, ...). Be slow. · · · 8/26
  • 9. What is CQL? How do you model application with CQL
  • 10. Cassandra modeling 101 Efficient queries in Cassandra boils down to: And denormalization is the technique that allows to achieve this in practice. But this imply the API should: The Thrift API allows that. So does CQL. 1. Data Locality at the cluster level: a query should only hit one node. 2. Data Locality at the node level: C* storage engine allows data collocation on disk. expose how to collocate data in the same replica set. expose how to collocate data on disk (for a given replica). to query data that is collocated. · · · 10/26
  • 11. A naive e-mailing application We want to model: Users Emails Users inboxes (all emails received by a user in chronological order) · · · 11/26
  • 13. Allowing user defined properties Say we want the user to be able to add to this own profile a set of custom properties: user_id email name password picture_profile user_props 51b-23-ab8 [email protected] Sylvain Lebresne B9a1^ 0xf8ac... { 'myProperty' : 'Whatever I want' } ALTERTABLEusersADDuser_propsmap<text,text>; UPDATEusersSETuser_props['myProperty']='WhateverIwant'WHEREuser_id=51b-23-ab8; SELECT*FROMusers; CQL 13/26
  • 14. Storing emails Only “indexed” queried are allowed. You cannot do: That is, unless you explicitely index from using: CREATETABLEemails( email_idtimeuuidPRIMARYKEY, --Embedstheemailcreationdate subjecttext, senderuuid, recipientsset<uuid>, bodytext ) --Insertsemails... CQL SELECT*FROMemailsWHEREsender=51b-23-ab8; CQL CREATEINDEXONemails(sender); CQL 14/26
  • 15. Inboxes For each user, it's inbox is the list of it's emails chronologically sorted. To display the inbox, we need for each email the subject, the sender and recipients names and emails. In a traditional RDBMS, we could join the users and emails table. In Cassandra, we denormalize. That is, we store the pre-computed result of queries we care about (always up to date materialized view). · · · Good luck to scale that!- · Collocate all the data for an inbox on the same node. Collocate all inbox emails on disk, in the order queried. This is typically the time-series kind of model for which Cassandra shines. - - - 15/26
  • 16. Storing inboxes CQL distinguishes 2 sub-parts in the PRIMARY KEY: In practice, we are interested by having emails stored in reverse chronological order. CREATETABLEinboxes( user_iduuid, email_idtimeuuid, sender_emailtext, recipients_emailsset<text>, subjecttext, is_readboolean, PRIMARYKEY(user_id, email_id) )WITHCLUSTERINGORDERBY(email_idDESC) CQL partition key: decides the node on which the data is stored clustering columns: within the same partition key, (CQL3) rows are physically ordered following the clustering columns · · 16/26
  • 17. Storing inboxes cont'd In this example, this allows efficient queries of time range of emails for a given inbox. email_id dateOf(email_id) sender_email recipients_emails subject d20-32-012 2013-06-24 00:42+0000 Yuki Morishita <[email protected]> { 'Sylvain Lebresne' } あなたに幸せな誕生日 false 17a-bf-65f 2013-03-01 17:03+0000 Aleksey Yeschenko <[email protected]> { 'Sylvain Lebresne' } RE: What do you think? true a9c-13-9da 2013-02-10 04:12+0000 Brandon Williams <[email protected]> { 'Jonathan Ellis', 'Sylvain Lebresne' } dtests are broken!?@# true 241-b4-ca0 2013-01-04 12:45+0000 Jonathan Ellis <[email protected]> { 'Sylvain Lebresne' } Whatzz up? true --Getallemailsforuser51b-23-ab8sinceJan01,2013inreversechronologicalorder. SELECTemail_id,dateOf(email_id),sender_email,recipients_emails,subject,is_read FROMinboxes WHEREuser_id=51b-23-ab8ANDemail_id>minTimeuuid('2013-01-0100:00+0000') ORDERBYemail_idDESC; CQL 17/26
  • 18. Handling huge inboxes What if inboxes can become too big? The traditional solution consists in sharding inboxes in adapted time shards (say a year), to avoid storing it all on one node. This can be easily done using a composite partition key: CREATETABLEinboxes( user_iduuid, yearint, email_idtimeuuid, sender_emailtext, recipients_namestext, subjecttext, PRIMARYKEY((user_id,year),email_id) )WITHCLUSTERINGORDERBY(email_idDESC) CQL 18/26
  • 19. Upgrading from thrift For more details on the relationship between thrift and CQL: CQL uses the same internal storage engine than Thrift CQL can read your existing Thrift column families (no data migration needed): You can read CQL3 tables from thrift, but this is not easy in practice because some CQL3 metadata are not exposed through thrift for compatibility reasons. CQL is meant to be an alternative to Thrift, not a complement to it. · · cqlsh>USE"<keyspace_name>"; cqlsh>DESCRIBE"<column_family_name>"; cqlsh>SELECT*FROM"<column_family_name>"LIMIT20; CQL · · http://www.datastax.com/dev/blog/thrift-to-cql3 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows · · 19/26
  • 21. The native protocol A binary transport for CQL3: Want to know more about drivers using this native protocol? Stay in the room for Michaël and Patrick's talk. Asynchronous (allows multiple concurrent queries per connection) Server notifications (Only for generic cluster events currently) Made for CQL3 · · · 21/26
  • 23. Cassandra 2.0: CQL3 Compare-and-swap support Triggers Allow preparation of TIMESTAMP, TTL and LIMIT. Primary key columns 2ndary indexing ALTER ... DROP · UPDATEloginSETpassword='fs3!c'WHEREusername='pcmanus'IFNOTEXISTS; UPDATEusersSETemail='[email protected]'WHEREuser_id=51b-23-ab8IFemail='[email protected]'; CQL · · · · 23/26
  • 24. Cassandra 2.0: Native protocol One-short prepare-and-execute message Batching of prepared statement SASL authentication Automatic query paging · · · · 24/26
  • 25. After C* 2.0 Continue to improve the user experience by facilitating good data modeling, while respecting Cassandra inherent specificities. Storage engine optimizations Collections 2ndary indexing Aggregations within a partition User defined 'struct' types ... · · · · · 25/26