SlideShare a Scribd company logo
So HappyTogether
 BigTable + Dynamo
 Semi-structured data model
 Decentralized – no special roles
 Ridiculously fast writes, fast reads
 Tunably consistent
 Cross-DC capable
 You design your data model based off of your
query model
 Real-time ad-hoc queries aren’t viable
 Secondary indexes help (0.7)
 What about analytics?
 Hadoop has analytics
 MapReduce
 Pig/Hive and other tools built above MapReduce
 Configurable data sources/destinations
 Many already familiar with it
 Active community
 Always able to output to Cassandra directly
 0.6
 ColumnFamilyInputFormat
 Pig support – Cassandra LoadFunc
 0.7
 ColumnFamilyOutputFormat
 Hadoop Streaming Output
 Streamlined configuration
 Recipe
 Overlay Hadoop on top of Cassandra
 Separate server for name node and job tracker
 Co-locate task trackers with Cassandra nodes
 Add data nodes to taste
 Voilà
 Data locality
 Analytics engine scales with data
 Example
 Cassandra specific InputFormat
 Configuration – ConfigHelper, Hadoop variables
 InputSplits over the data – tunable
 Example usage in contrib/word_count
 OutputFormat
 Configuration – ConfigHelper, Hadoop variables
 Batches output – tunable
 Don’t have to use Cassandra api
 Some optimizations (e.g. ConsistencyLevel.ONE)
 Example usage in contrib/word_count
 60,000+ Documented UFO Sightings
 Data set from http://infochimps.com
sighted_at reported_at location shape duration description
19951009 19951009 Iowa City, IA
Man repts.Witnessing “flash,
followed by a classic UFO, w/ a
tailfin at back.” …
19940801 19950220 Renton, WA
Man repts. seeing 2x large
ships hovering in night sky
while using Russian-made
night binoculars.
19970111 19970111 St. Cloud, MN pyramid 2 min.
Summary : Right when me and
my friend left my house we
saw a bright green glowing
object that looked like a 4
sided pyramid then after about
2 min it took off straight into
the sky leaving a yellow trail
behind it…
 What about languages outside of Java?
 Build on what Hadoop uses - Streaming
 Output streaming in 0.7.0
 Example in contrib/hadoop_streaming_output
 Input streaming in progress, likely 0.7.1
 Developed atYahoo!
 PigLatin/Grunt shell
 Powerful scripting language for analytics
 Example usage in contrib/pig
 Configuration – Hadoop/Env variables
 Raptr.com
 Home grown solution -> Cassandra + Hadoop
 Query time: hours -> minutes
 Pig obviated their need for multi-lingual MR
 Speed and ease are enabling
 Imagini/Visual DNA
 US Government (Digital Reasoning)
 See http://github.com/digitalreasoning/PyStratus
 Hive support in progress (HIVE-1434)
 Hadoop Input Streaming (likely 0.7.1)
 Performance improvements
 Hadoop analytics for Cassandra
 Data locality for processing
 Scales with the cluster
 More information
 http://cassandra.apache.org
 http://wiki.apache.org/cassandra/HadoopSupport
 Cassandra:The Definitive Guide
 About me:
 jeremy.hanna@rackspace.com
 @jeromatron onTwitter
 jeromatron on IRC in #cassandra

More Related Content

What's hot (20)

PPTX
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
PPTX
Map Reduce
Rahul Agarwal
 
PPTX
MapReduce basic
Chirag Ahuja
 
PDF
Hadoop - Simple. Scalable.
elliando dias
 
PDF
SparkR-Advance Analytic for Big Data
samuel shamiri
 
PDF
Geek camp
jdhok
 
KEY
Getting Started on Hadoop
Paco Nathan
 
PDF
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
PDF
20170210 sapporotechbar7
Ryuji Tamagawa
 
PDF
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
 
PDF
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
 
PDF
Introduction to hadoop ecosystem
Rupak Roy
 
PPT
Introduction to Apache Hadoop
Steve Watt
 
PDF
Map Analytics in Starcraft II (2/3/2015)
gy8
 
PPT
hadoop&zing
zingopen
 
PPTX
Introduction to Map Reduce
Apache Apex
 
PDF
R, Hadoop and Amazon Web Services
Portland R User Group
 
DOCX
Hadoop online training course
Kamal A
 
PPT
Another Intro To Hadoop
Adeel Ahmad
 
PPT
Introduction To Map Reduce
rantav
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
Map Reduce
Rahul Agarwal
 
MapReduce basic
Chirag Ahuja
 
Hadoop - Simple. Scalable.
elliando dias
 
SparkR-Advance Analytic for Big Data
samuel shamiri
 
Geek camp
jdhok
 
Getting Started on Hadoop
Paco Nathan
 
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
20170210 sapporotechbar7
Ryuji Tamagawa
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
 
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
 
Introduction to hadoop ecosystem
Rupak Roy
 
Introduction to Apache Hadoop
Steve Watt
 
Map Analytics in Starcraft II (2/3/2015)
gy8
 
hadoop&zing
zingopen
 
Introduction to Map Reduce
Apache Apex
 
R, Hadoop and Amazon Web Services
Portland R User Group
 
Hadoop online training course
Kamal A
 
Another Intro To Hadoop
Adeel Ahmad
 
Introduction To Map Reduce
rantav
 

Viewers also liked (12)

PPTX
Real time ship tracking system using ais data
Chathura
 
PPTX
Flapping Foil Propulsion System in Ship and Underwater Vehicles
Sharat Mathew
 
PDF
Marine Propulsion
Siva Chidambaram
 
PPTX
Propulsion Systems Of Ships
Vipin Devaraj
 
PDF
Marine Propulsion History and Electric Propulsion & Future Technology
Mohammud Hanif Dewan M.Phil.
 
PPTX
A seminar report on Electric Propulsion
SAKTI PRASAD MISHRA
 
PPTX
The Electric Propulsion Systems
Port Said University
 
PPTX
Hydraulics training
Sunil Dewalekar
 
DOCX
SHIP PROPULSION SEMINAR report
DNSPTL4569
 
PPTX
Basic hydraulic circuit
Cik Aisyahfitrah
 
PPTX
BIOMIMETIC ARCHITECTURE
Vaisali Krishnakumar
 
PPT
Biomimicry
NUS SDE
 
Real time ship tracking system using ais data
Chathura
 
Flapping Foil Propulsion System in Ship and Underwater Vehicles
Sharat Mathew
 
Marine Propulsion
Siva Chidambaram
 
Propulsion Systems Of Ships
Vipin Devaraj
 
Marine Propulsion History and Electric Propulsion & Future Technology
Mohammud Hanif Dewan M.Phil.
 
A seminar report on Electric Propulsion
SAKTI PRASAD MISHRA
 
The Electric Propulsion Systems
Port Said University
 
Hydraulics training
Sunil Dewalekar
 
SHIP PROPULSION SEMINAR report
DNSPTL4569
 
Basic hydraulic circuit
Cik Aisyahfitrah
 
BIOMIMETIC ARCHITECTURE
Vaisali Krishnakumar
 
Biomimicry
NUS SDE
 
Ad

Similar to Cassandra + Hadoop @ApacheCon (20)

PPT
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
PPTX
Hadoop and BigData - July 2016
Ranjith Sekar
 
PPTX
Hadoop and Big Data: Revealed
Sachin Holla
 
ODP
Hadoop demo ppt
Phil Young
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
PDF
Hadoop 2.0 handout 5.0
Manaranjan Pradhan
 
PDF
9/2017 STL HUG - Back to School
Adam Doyle
 
PPTX
Python in big data world
Rohit
 
PPT
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
PPTX
Presentation sreenu dwh-services
Sreenu Musham
 
PPTX
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
NashvilleTechCouncil
 
PPT
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
PPT
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
PDF
Best hadoop-online-training
Geohedrick
 
ODP
Training
Doug Chang
 
PDF
Hadoop - A Very Short Introduction
dewang_mistry
 
PDF
Interactive SQL-on-Hadoop and JethroData
Ofir Manor
 
PDF
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
PPTX
Scalable Hadoop with succinct Python: the best of both worlds
DataWorks Summit
 
PDF
Hadoop online training
srikanthhadoop
 
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Hadoop and BigData - July 2016
Ranjith Sekar
 
Hadoop and Big Data: Revealed
Sachin Holla
 
Hadoop demo ppt
Phil Young
 
Hadoop MapReduce Fundamentals
Lynn Langit
 
Hadoop 2.0 handout 5.0
Manaranjan Pradhan
 
9/2017 STL HUG - Back to School
Adam Doyle
 
Python in big data world
Rohit
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
Presentation sreenu dwh-services
Sreenu Musham
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
NashvilleTechCouncil
 
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Best hadoop-online-training
Geohedrick
 
Training
Doug Chang
 
Hadoop - A Very Short Introduction
dewang_mistry
 
Interactive SQL-on-Hadoop and JethroData
Ofir Manor
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
Scalable Hadoop with succinct Python: the best of both worlds
DataWorks Summit
 
Hadoop online training
srikanthhadoop
 
Ad

More from Jeremy Hanna (12)

PDF
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Modern Cassandra for Developers
Jeremy Hanna
 
PDF
Troubleshooting Cassandra
Jeremy Hanna
 
PPT
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Jeremy Hanna
 
KEY
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 
KEY
Cassandra eu
Jeremy Hanna
 
PPTX
Pig with Cassandra: Adventures in Analytics
Jeremy Hanna
 
PPTX
Cassandra/Hadoop Integration
Jeremy Hanna
 
PPTX
Intro to cassandra + hadoop
Jeremy Hanna
 
KEY
Cassandra+Hadoop
Jeremy Hanna
 
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Modern Cassandra for Developers
Jeremy Hanna
 
Troubleshooting Cassandra
Jeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Jeremy Hanna
 
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 
Cassandra eu
Jeremy Hanna
 
Pig with Cassandra: Adventures in Analytics
Jeremy Hanna
 
Cassandra/Hadoop Integration
Jeremy Hanna
 
Intro to cassandra + hadoop
Jeremy Hanna
 
Cassandra+Hadoop
Jeremy Hanna
 

Recently uploaded (20)

PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
July Patch Tuesday
Ivanti
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
July Patch Tuesday
Ivanti
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 

Cassandra + Hadoop @ApacheCon

  • 2.  BigTable + Dynamo  Semi-structured data model  Decentralized – no special roles  Ridiculously fast writes, fast reads  Tunably consistent  Cross-DC capable
  • 3.  You design your data model based off of your query model  Real-time ad-hoc queries aren’t viable  Secondary indexes help (0.7)  What about analytics?
  • 4.  Hadoop has analytics  MapReduce  Pig/Hive and other tools built above MapReduce  Configurable data sources/destinations  Many already familiar with it  Active community
  • 5.  Always able to output to Cassandra directly  0.6  ColumnFamilyInputFormat  Pig support – Cassandra LoadFunc  0.7  ColumnFamilyOutputFormat  Hadoop Streaming Output  Streamlined configuration
  • 6.  Recipe  Overlay Hadoop on top of Cassandra  Separate server for name node and job tracker  Co-locate task trackers with Cassandra nodes  Add data nodes to taste  Voilà  Data locality  Analytics engine scales with data  Example
  • 7.  Cassandra specific InputFormat  Configuration – ConfigHelper, Hadoop variables  InputSplits over the data – tunable  Example usage in contrib/word_count
  • 8.  OutputFormat  Configuration – ConfigHelper, Hadoop variables  Batches output – tunable  Don’t have to use Cassandra api  Some optimizations (e.g. ConsistencyLevel.ONE)  Example usage in contrib/word_count
  • 9.  60,000+ Documented UFO Sightings  Data set from http://infochimps.com sighted_at reported_at location shape duration description 19951009 19951009 Iowa City, IA Man repts.Witnessing “flash, followed by a classic UFO, w/ a tailfin at back.” … 19940801 19950220 Renton, WA Man repts. seeing 2x large ships hovering in night sky while using Russian-made night binoculars. 19970111 19970111 St. Cloud, MN pyramid 2 min. Summary : Right when me and my friend left my house we saw a bright green glowing object that looked like a 4 sided pyramid then after about 2 min it took off straight into the sky leaving a yellow trail behind it…
  • 10.  What about languages outside of Java?  Build on what Hadoop uses - Streaming  Output streaming in 0.7.0  Example in contrib/hadoop_streaming_output  Input streaming in progress, likely 0.7.1
  • 11.  Developed atYahoo!  PigLatin/Grunt shell  Powerful scripting language for analytics  Example usage in contrib/pig  Configuration – Hadoop/Env variables
  • 12.  Raptr.com  Home grown solution -> Cassandra + Hadoop  Query time: hours -> minutes  Pig obviated their need for multi-lingual MR  Speed and ease are enabling  Imagini/Visual DNA  US Government (Digital Reasoning)  See http://github.com/digitalreasoning/PyStratus
  • 13.  Hive support in progress (HIVE-1434)  Hadoop Input Streaming (likely 0.7.1)  Performance improvements
  • 14.  Hadoop analytics for Cassandra  Data locality for processing  Scales with the cluster
  • 15.  More information  http://cassandra.apache.org  http://wiki.apache.org/cassandra/HadoopSupport  Cassandra:The Definitive Guide  About me:  [email protected]  @jeromatron onTwitter  jeromatron on IRC in #cassandra

Editor's Notes

  • #2: Talk a little about background of the theme – hippies, The Turtles, readability.
  • #6: Mention Jeff Hodges, Johan, Stu, and Todd Lipcon.
  • #7: Mention how InputSplit works and how it can choose among replicas – array of locations returned.
  • #8: Highlight how this is the same extension point that is used with HDFS, HBase and any other data source/destination for MapReduce.
  • #13: IOW, are people using this stuff in the real world? In production? Put some notes in here about raptr and imagini’s use cases.