SlideShare a Scribd company logo
Apache Cassandra:
NoSQL in the
Enterprise, today
             Jonathan Ellis
                      CTO
                  @spyced
Cassandra Job Trends (indeed.com)
“Big Data” trend
Why Big Data Matters




Research done by McKinsey & Company shows the eye-opening, 10-year
category growth rate differences between businesses that smartly use their big
data and those that do not.
Big data




 Analytics       Realtime
             ?
 (Hadoop)        (“NoSQL”)
Some users
✤   Financial
✤   Social Media
✤   Advertising
✤   Entertainment
✤   Energy
✤   E-tail
✤   Health care
✤   Government
Common use cases

✤   Time series data
✤   Messaging
✤   Ad tracking
✤   Data mining
✤   User activity streams
✤   User sessions
✤   Anything requiring:
    Scalable + performant + highly available
Why Cassandra?

✤   Fully distributed, no SPOF
✤   Multi-master, multi-DC
✤   Linearly scalable
✤   Larger-than-memory datasets
✤   Best-in-class performance (not just writes!)
✤   Fully durable
✤   Integrated caching
✤   Tuneable consistency
Classing partitioning with SPOF

   partition 1   partition 2        partition 3   partition 4
      slave

      slave

     master




                               request
                                router
Fully distributed, no SPOF

  client




           p3
                 p6          p1
            p1




                      p1
Apache Cassandra: NoSQL in the enterprise
Performance summary
Apache Cassandra: NoSQL in the enterprise
“With Cassandra, we get better business agility, and we
don’t have to plan capacity in advance, we don’t need to
ask permission of other people to build things for us,
and we don’t worry about running out of space or
power.”


Adrian Cockcroft, Cloud Architect
Netflix on Cassandra

✤   Could not build datacenters fast enough
✤   Made decision to go to cloud (AWS)
✤   Applications include Netflix’s subscriber system, AB
    testing, and viewing history service

✤   Over a year in, Netflix finds Cassandra to be
    ✤   Fast
    ✤   Cost-effective
    ✤   Scalable
    ✤   Flexible
    ✤   Reliable: no SPOF
“Without Cassandra, our engineers would’ve had to
create something that could scale to our needs, that
would’ve prevented us from focusing on building
product and solving problems for Backupify’s users,
which are far more important tasks.”


Matt Conway, VP Engineering
Backupify on Cassandra

✤   Cloud-based utility that enables businesses and
    consumers to backup, search and restore the content of
    popular online applications such as Google Apps,
    Gmail, Facebook, Twitter, and Blogger

✤   Cassandra findings:
    ✤   Solved scaling, allowing engineers to focus on their business
    ✤   DataStax OpsCenter made it easy to monitor the health and
        performance of their cluster
    ✤   Reliable, redundant and scalable data storage helped
        eliminate down-time
    ✤   Ability to offer both backup and storage, but also analysis
“You can seamlessly add new nodes and expand your
total capacity without deteriorating the performance of
the data store. Cassandra has allowed us to scale very
effectively.”


Harry Robertson, Tech Lead
Ooyala on Cassandra

✤   Ooyala provides a suite of technologies and services that
    support content owners in managing, analyzing and
    monetizing the digital video they publish online

✤   Cassandra findings:
    ✤   Classic “Big Data” problem did not require re-architecting
    ✤   Delivered ability to respond to increasingly sophisticated
        analytic needs of customers
    ✤   Developers spend time building application features, not
        figuring out how to scale
“Cassandra has allowed us to build bigger features
faster and more reliably, while using less money and
without needing to expand our staff.”


Kyle Ambroff, Sr. Engineer
Formspring on Cassandra

✤   Users of Formspring engage with and learn more about
    each other by asking and responding to questions. Close
    to 4B responses in the system and 30M unique users

✤   Cassandra experience
    ✤   No sharding needed – just add nodes to scale
    ✤   Performance – the popular users with many followers saw no
        speed reduction. No more memcached!
    ✤   Flexibility of a schema-optional architecture is very developer
        friendly
Big data




 Analytics       Realtime
             ?
 (Hadoop)        (“NoSQL”)
The evolution of Analytics




            Analytics + Realtime
The evolution of Analytics




                   replication




       Analytics                 Realtime
The evolution of Analytics




                  ETL
Big data




 Analytics    Datastax    Realtime
 (Hadoop)    Enterprise   (“NoSQL”)
DataStax Enterprise re-unifies
realtime and analytics
Apache Cassandra: NoSQL in the enterprise
Portfolio Demo dataflow


Portfolios                Portfolios
Historical Prices         Live Prices for today
Intermediate Results
Largest loss              Largest loss
Operations

✤   “Vanilla” Hadoop
    ✤   8+ services to setup, monitor, backup, and recover
        (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker,
        Zookeeper, Region Server,...)
    ✤   Single points of failure
    ✤   Can't separate online and offline processing

✤   DataStax Enterprise
    ✤   Single, simplified component
    ✤   Self-organizes based on workload
    ✤   Peer to peer
    ✤   JobTracker failover
Managing & Monitoring Big Data
✤   DataStax OpsCenter
    manages and
    monitors all
    Cassandra and
    Hadoop operations
Questions?

More Related Content

What's hot (20)

PDF
Building a Digital Bank
DataStax
 
PPTX
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
DataStax
 
PPTX
Building and Maintaining Bulletproof Systems with DataStax
DataStax
 
PPTX
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
PPTX
How to Successfully Visualize DSE Graph data
DataStax
 
PPTX
Introduction: Architecting for Scale
DataStax
 
PPTX
Webinar: Don't Leave Your Data in the Dark
DataStax
 
PPTX
The Big Data Ecosystem for Financial Services
DataStax
 
PDF
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Data Con LA
 
PDF
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
PDF
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Precisely
 
PPTX
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Tyler Wishnoff
 
PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic
 
PDF
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Data Con LA
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PDF
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 
PDF
Webinar - Bringing Game Changing Insights with Graph Databases
DataStax
 
PPTX
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
DataStax
 
PPTX
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore
 
Building a Digital Bank
DataStax
 
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
DataStax
 
Building and Maintaining Bulletproof Systems with DataStax
DataStax
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
How to Successfully Visualize DSE Graph data
DataStax
 
Introduction: Architecting for Scale
DataStax
 
Webinar: Don't Leave Your Data in the Dark
DataStax
 
The Big Data Ecosystem for Financial Services
DataStax
 
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Data Con LA
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Precisely
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Tyler Wishnoff
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
SnapLogic
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Data Con LA
 
Azure Synapse Analytics Overview (r2)
James Serra
 
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 
Webinar - Bringing Game Changing Insights with Graph Databases
DataStax
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
DataStax
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore
 

Viewers also liked (12)

PPTX
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
PDF
Cassandra Explained
Eric Evans
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PPTX
Cassandra ppt 2
Skillwise Group
 
ODP
Introduciton to Apache Cassandra for Java Developers (JavaOne)
zznate
 
PPTX
Cassandra ppt 1
Skillwise Group
 
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
KEY
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Apache cassandra architecture internals
Bhuvan Rawal
 
PDF
Cassandra NoSQL Tutorial
Michelle Darling
 
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
Cassandra Explained
Eric Evans
 
An Overview of Apache Cassandra
DataStax
 
Cassandra ppt 2
Skillwise Group
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
zznate
 
Cassandra ppt 1
Skillwise Group
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
Introduction to Apache Cassandra
Robert Stupp
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Apache cassandra architecture internals
Bhuvan Rawal
 
Cassandra NoSQL Tutorial
Michelle Darling
 
Ad

Similar to Apache Cassandra: NoSQL in the enterprise (20)

PDF
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
 
PDF
The Future Of Big Data
Matthew Dennis
 
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
PPTX
John Glendenning - Real time data driven services in the Cloud
WeAreEsynergy
 
PPTX
DataStax
Michael Shaler
 
PDF
Slides: Relational to NoSQL Migration
DATAVERSITY
 
PDF
Introduction to Apache Cassandra
Instaclustr
 
PDF
BigData as a Platform: Cassandra and Current Trends
Matthew Dennis
 
PDF
Top 5 Considerations for a Big Data Solution
DataStax
 
PPTX
Presentation of Apache Cassandra
Nikiforos Botis
 
PDF
The Top 5 Factors to Consider When Choosing a Big Data Solution
DATAVERSITY
 
PDF
What is DataStax Enterprise?
DataStax
 
PPTX
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
PPTX
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
PPTX
Evaluating Apache Cassandra as a Cloud Database
DataStax
 
PPTX
BigData Developers MeetUp
Christian Johannsen
 
PDF
Pythian: My First 100 days with a Cassandra Cluster
DataStax Academy
 
PPTX
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
PPTX
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
PDF
State of Cassandra 2012
jbellis
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
 
The Future Of Big Data
Matthew Dennis
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
John Glendenning - Real time data driven services in the Cloud
WeAreEsynergy
 
DataStax
Michael Shaler
 
Slides: Relational to NoSQL Migration
DATAVERSITY
 
Introduction to Apache Cassandra
Instaclustr
 
BigData as a Platform: Cassandra and Current Trends
Matthew Dennis
 
Top 5 Considerations for a Big Data Solution
DataStax
 
Presentation of Apache Cassandra
Nikiforos Botis
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
DATAVERSITY
 
What is DataStax Enterprise?
DataStax
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Evaluating Apache Cassandra as a Cloud Database
DataStax
 
BigData Developers MeetUp
Christian Johannsen
 
Pythian: My First 100 days with a Cassandra Cluster
DataStax Academy
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
State of Cassandra 2012
jbellis
 
Ad

More from jbellis (20)

PPTX
Vector Search @ sw2con for slideshare.pptx
jbellis
 
PDF
Five Lessons in Distributed Databases
jbellis
 
PDF
Data day texas: Cassandra and the Cloud
jbellis
 
PDF
Cassandra Summit 2015
jbellis
 
PDF
Cassandra summit keynote 2014
jbellis
 
PDF
Cassandra 2.1
jbellis
 
PDF
Tokyo cassandra conference 2014
jbellis
 
PDF
Cassandra Summit EU 2013
jbellis
 
PDF
London + Dublin Cassandra 2.0
jbellis
 
PDF
Cassandra Summit 2013 Keynote
jbellis
 
PDF
Cassandra at NoSql Matters 2012
jbellis
 
PDF
Top five questions to ask when choosing a big data solution
jbellis
 
PDF
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
PDF
Cassandra 1.1
jbellis
 
PDF
Pycon 2012 What Python can learn from Java
jbellis
 
PDF
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
PDF
Cassandra at High Performance Transaction Systems 2011
jbellis
 
PDF
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 
PDF
What python can learn from java
jbellis
 
PDF
State of Cassandra, 2011
jbellis
 
Vector Search @ sw2con for slideshare.pptx
jbellis
 
Five Lessons in Distributed Databases
jbellis
 
Data day texas: Cassandra and the Cloud
jbellis
 
Cassandra Summit 2015
jbellis
 
Cassandra summit keynote 2014
jbellis
 
Cassandra 2.1
jbellis
 
Tokyo cassandra conference 2014
jbellis
 
Cassandra Summit EU 2013
jbellis
 
London + Dublin Cassandra 2.0
jbellis
 
Cassandra Summit 2013 Keynote
jbellis
 
Cassandra at NoSql Matters 2012
jbellis
 
Top five questions to ask when choosing a big data solution
jbellis
 
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
Cassandra 1.1
jbellis
 
Pycon 2012 What Python can learn from Java
jbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
Cassandra at High Performance Transaction Systems 2011
jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 
What python can learn from java
jbellis
 
State of Cassandra, 2011
jbellis
 

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 

Apache Cassandra: NoSQL in the enterprise

  • 1. Apache Cassandra: NoSQL in the Enterprise, today Jonathan Ellis CTO @spyced
  • 2. Cassandra Job Trends (indeed.com)
  • 4. Why Big Data Matters Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.
  • 5. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)
  • 6. Some users ✤ Financial ✤ Social Media ✤ Advertising ✤ Entertainment ✤ Energy ✤ E-tail ✤ Health care ✤ Government
  • 7. Common use cases ✤ Time series data ✤ Messaging ✤ Ad tracking ✤ Data mining ✤ User activity streams ✤ User sessions ✤ Anything requiring: Scalable + performant + highly available
  • 8. Why Cassandra? ✤ Fully distributed, no SPOF ✤ Multi-master, multi-DC ✤ Linearly scalable ✤ Larger-than-memory datasets ✤ Best-in-class performance (not just writes!) ✤ Fully durable ✤ Integrated caching ✤ Tuneable consistency
  • 9. Classing partitioning with SPOF partition 1 partition 2 partition 3 partition 4 slave slave master request router
  • 10. Fully distributed, no SPOF client p3 p6 p1 p1 p1
  • 14. “With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.” Adrian Cockcroft, Cloud Architect
  • 15. Netflix on Cassandra ✤ Could not build datacenters fast enough ✤ Made decision to go to cloud (AWS) ✤ Applications include Netflix’s subscriber system, AB testing, and viewing history service ✤ Over a year in, Netflix finds Cassandra to be ✤ Fast ✤ Cost-effective ✤ Scalable ✤ Flexible ✤ Reliable: no SPOF
  • 16. “Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.” Matt Conway, VP Engineering
  • 17. Backupify on Cassandra ✤ Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger ✤ Cassandra findings: ✤ Solved scaling, allowing engineers to focus on their business ✤ DataStax OpsCenter made it easy to monitor the health and performance of their cluster ✤ Reliable, redundant and scalable data storage helped eliminate down-time ✤ Ability to offer both backup and storage, but also analysis
  • 18. “You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.” Harry Robertson, Tech Lead
  • 19. Ooyala on Cassandra ✤ Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online ✤ Cassandra findings: ✤ Classic “Big Data” problem did not require re-architecting ✤ Delivered ability to respond to increasingly sophisticated analytic needs of customers ✤ Developers spend time building application features, not figuring out how to scale
  • 20. “Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.” Kyle Ambroff, Sr. Engineer
  • 21. Formspring on Cassandra ✤ Users of Formspring engage with and learn more about each other by asking and responding to questions. Close to 4B responses in the system and 30M unique users ✤ Cassandra experience ✤ No sharding needed – just add nodes to scale ✤ Performance – the popular users with many followers saw no speed reduction. No more memcached! ✤ Flexibility of a schema-optional architecture is very developer friendly
  • 22. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)
  • 23. The evolution of Analytics Analytics + Realtime
  • 24. The evolution of Analytics replication Analytics Realtime
  • 25. The evolution of Analytics ETL
  • 26. Big data Analytics Datastax Realtime (Hadoop) Enterprise (“NoSQL”)
  • 29. Portfolio Demo dataflow Portfolios Portfolios Historical Prices Live Prices for today Intermediate Results Largest loss Largest loss
  • 30. Operations ✤ “Vanilla” Hadoop ✤ 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) ✤ Single points of failure ✤ Can't separate online and offline processing ✤ DataStax Enterprise ✤ Single, simplified component ✤ Self-organizes based on workload ✤ Peer to peer ✤ JobTracker failover
  • 31. Managing & Monitoring Big Data ✤ DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations