SlideShare a Scribd company logo
Cassandra + HadoopAn Introduction to Hadoop Analytics over Cassandra Data
IntroductionsWhat is Cassandra?A highly scalable distributed data storeBorn at Facebook, grew up in the communityWhat is Hadoop?A set of Apache projectsDeal with Big Data in a distributed wayOpen source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
What makes them compatible?Cassandra is great at a lot of thingsFast, extremely scalable writes, fast random readsFlexible semi-structured data modelNot as good with ad-hoc answersEnter HadoopMapReduce, Pig, and Hive are extensibleOutput from Hadoop into Cassandra
MapReduceInput from Cassandra as of 0.6.xBaked in output to Cassandra as of 0.7.0Streaming support is coming in 0.7Example: WordCount
PigWhat is Pig?A platform for data analytics developed at Yahoo!Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduceSimplifies data analysisCassandra integrationStu Hood added Pig integration in Cassandra 0.6Example: WordCount with Pig
HiveWhat is Hive?A platform for data analytics developed at FacebookDraws from the familiar SQL -> Hive QLCompiles down to MapReduceCassandra integrationAvailability of a Cassandra storage handler is coming soon – HIVE-1434
Example Use CaseRaptr.comGaming statistics and achievements across platformsHome-grown -> Cassandra + Hadoop (Pig)Idea to execution much fasterQuery runtime from hours to 10-15 minutes
QuestionsContactEmail: jeremy.hanna@rackspace.comTwitter: @jeromatronIRC: jeromatron on irc.freenode.net - #cassandra, #hadoopFurther informationhttp://wiki.apache.org/cassandra/HadoopSupportCassandra: The Definitive Guide

More Related Content

What's hot (20)

PPTX
Cloud Optimized Big Data
Joydeep Sen Sarma
 
PPTX
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 
PPTX
Real Time and Big Data – It’s About Time
MapR Technologies
 
PDF
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
 
PPTX
Hadoop overview
Siva Pandeti
 
PPT
Nextag talk
Joydeep Sen Sarma
 
PDF
Migrating structured data between Hadoop and RDBMS
Bouquet
 
PPTX
Real Time and Big Data – It’s About Time
DataWorks Summit
 
PPTX
Hadoop and HBase @eBay
DataWorks Summit
 
KEY
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Ryu Kobayashi
 
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
PPT
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
PDF
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Uwe Printz
 
PPT
Hadoop distributions - ecosystem
Jakub Stransky
 
PPTX
מיכאל
sqlserver.co.il
 
PPTX
Apache Hadoop at 10
Cloudera, Inc.
 
PPTX
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
 
PPTX
Hadoop and Big Data: Revealed
Sachin Holla
 
ODP
Hadoop - Overview
Jay
 
PDF
알쓸신잡
youngick
 
Cloud Optimized Big Data
Joydeep Sen Sarma
 
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 
Real Time and Big Data – It’s About Time
MapR Technologies
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
 
Hadoop overview
Siva Pandeti
 
Nextag talk
Joydeep Sen Sarma
 
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Real Time and Big Data – It’s About Time
DataWorks Summit
 
Hadoop and HBase @eBay
DataWorks Summit
 
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
Ryu Kobayashi
 
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Uwe Printz
 
Hadoop distributions - ecosystem
Jakub Stransky
 
מיכאל
sqlserver.co.il
 
Apache Hadoop at 10
Cloudera, Inc.
 
Messaging architecture @FB (Fifth Elephant Conference)
Joydeep Sen Sarma
 
Hadoop and Big Data: Revealed
Sachin Holla
 
Hadoop - Overview
Jay
 
알쓸신잡
youngick
 

Similar to Intro to cassandra + hadoop (20)

ODP
Big data
Kevin Cawley
 
PPTX
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
 
PDF
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Stu Hood
 
PPT
Cassandra4hadoop
Edward Capriolo
 
PPT
Cassandra4Hadoop
DataStax Academy
 
PDF
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
StampedeCon
 
PPTX
Apache Cassandra introduction
fardinjamshidi
 
ODP
Hadoop and Cassandra at Rackspace
Stu Hood
 
PDF
Beginning Apache Cassandra Development 1st Edition Vivek Mishra Auth
rewllwvtzs4247
 
PDF
Cassandra Hadoop Best Practices by Jeremy Hanna
Modern Data Stack France
 
PDF
Data Storage Management
Nisheet Mahajan
 
PPTX
Data engineering
Parimala Killada
 
PDF
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
DataStax Academy
 
PPTX
Hadoop, Infrastructure and Stack
John Dougherty
 
PDF
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 
PDF
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
PPTX
Introduction to PIG
Shanmathy Prabakaran
 
PPTX
Hadoop workshop
Purna Chander
 
PPTX
Intro to hadoop ecosystem
Grzegorz Kolpuc
 
PDF
Cassandra Prophecy
Igor Khotin
 
Big data
Kevin Cawley
 
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Stu Hood
 
Cassandra4hadoop
Edward Capriolo
 
Cassandra4Hadoop
DataStax Academy
 
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
StampedeCon
 
Apache Cassandra introduction
fardinjamshidi
 
Hadoop and Cassandra at Rackspace
Stu Hood
 
Beginning Apache Cassandra Development 1st Edition Vivek Mishra Auth
rewllwvtzs4247
 
Cassandra Hadoop Best Practices by Jeremy Hanna
Modern Data Stack France
 
Data Storage Management
Nisheet Mahajan
 
Data engineering
Parimala Killada
 
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
DataStax Academy
 
Hadoop, Infrastructure and Stack
John Dougherty
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Introduction to PIG
Shanmathy Prabakaran
 
Hadoop workshop
Purna Chander
 
Intro to hadoop ecosystem
Grzegorz Kolpuc
 
Cassandra Prophecy
Igor Khotin
 
Ad

More from Jeremy Hanna (7)

PDF
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Modern Cassandra for Developers
Jeremy Hanna
 
PDF
Troubleshooting Cassandra
Jeremy Hanna
 
PPT
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Jeremy Hanna
 
KEY
Cassandra+Hadoop
Jeremy Hanna
 
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Modern Cassandra for Developers
Jeremy Hanna
 
Troubleshooting Cassandra
Jeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Jeremy Hanna
 
Cassandra+Hadoop
Jeremy Hanna
 
Ad

Recently uploaded (20)

PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
July Patch Tuesday
Ivanti
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Intro to cassandra + hadoop

  • 1. Cassandra + HadoopAn Introduction to Hadoop Analytics over Cassandra Data
  • 2. IntroductionsWhat is Cassandra?A highly scalable distributed data storeBorn at Facebook, grew up in the communityWhat is Hadoop?A set of Apache projectsDeal with Big Data in a distributed wayOpen source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
  • 3. What makes them compatible?Cassandra is great at a lot of thingsFast, extremely scalable writes, fast random readsFlexible semi-structured data modelNot as good with ad-hoc answersEnter HadoopMapReduce, Pig, and Hive are extensibleOutput from Hadoop into Cassandra
  • 4. MapReduceInput from Cassandra as of 0.6.xBaked in output to Cassandra as of 0.7.0Streaming support is coming in 0.7Example: WordCount
  • 5. PigWhat is Pig?A platform for data analytics developed at Yahoo!Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduceSimplifies data analysisCassandra integrationStu Hood added Pig integration in Cassandra 0.6Example: WordCount with Pig
  • 6. HiveWhat is Hive?A platform for data analytics developed at FacebookDraws from the familiar SQL -> Hive QLCompiles down to MapReduceCassandra integrationAvailability of a Cassandra storage handler is coming soon – HIVE-1434
  • 7. Example Use CaseRaptr.comGaming statistics and achievements across platformsHome-grown -> Cassandra + Hadoop (Pig)Idea to execution much fasterQuery runtime from hours to 10-15 minutes
  • 8. QuestionsContactEmail: [email protected]: @jeromatronIRC: jeromatron on irc.freenode.net - #cassandra, #hadoopFurther informationhttp://wiki.apache.org/cassandra/HadoopSupportCassandra: The Definitive Guide