SlideShare a Scribd company logo
WHAT’S NEXT
FOR BIG DATA?
APACHE
SPARK
WTH IS SPARK?
3 TUMRA - Big Data Week, May 2014

Spark is …
“One platform to rule them all”

… and blurs boundary between SQL,
machine learning, streams & graphs
4 TUMRA - Big Data Week, May 2014

Spark is …
… gaining momentum
5 TUMRA - Big Data Week, May 2014

Spark has …
… more contributors than Hadoop
6 TUMRA - Big Data Week, May 2014

Spark can …
 
Source:	
  Databricks	
  
7 TUMRA - Big Data Week, May 2014

Spark Stack
 
Source:	
  Databricks	
  
Hadoop	
  (HDFS)	
  
8 TUMRA - Big Data Week, May 2014

Why Spark?
-  Code reuse across batch, streaming
and interactive applications
-  Easy API from Scala, Java & Python
-  In-memory data sharing
FAAAAAAST!!!
Check out http://spark.apache.org
9 TUMRA - Big Data Week, May 2014
CASE STUDY:
PERSONALISATION &
MARKETING
AUTOMATION
10 TUMRA - Big Data Week, May 2014
Our history with Spark
-  Early adopters; poc in Dec ‘12
-  In production since March ‘13
-  Running on Amazon EC2
-  Ad-hoc analysis and reporting
-  Machine learning model building
-  Integrates to our real-time dashboards
11 TUMRA - Big Data Week, May 2014
Use Case: Personalisation
12 TUMRA - Big Data Week, May 2014
Use Case: Personalisation (cont’d)
-  Matching visitors to products
-  50% of visitors are ‘new’ and have
no history to work with
-  Blend of pre-computation and real-
time recommendations
13 TUMRA - Big Data Week, May 2014
Use Case: Marketing Automation
-  Collect user engagement data
across websites and mobile apps
-  Increase subscription rates
-  Identity users at risk of churn
-  Automated personalised marketing
14 TUMRA - Big Data Week, May 2014
Data Volumes & Velocity
-  29M events per day
-  Peak rates ~800 events / second
-  All events streamed to Kafka
-  10B archived events in Amazon S3
15 TUMRA - Big Data Week, May 2014
How we use Spark

Amazon	
  S3	
  (HDFS	
  interface)	
   Apache	
  Ka>a	
  
Data	
  CollecAon	
  API	
  (Akka)	
  &	
  Connectors	
  
16 TUMRA - Big Data Week, May 2014
Spark gives us …
-  Unified platform for machine
learning and graph analytics
-  Ability to experiment at huge scale
-  SQL interfaces to existing tools
-  Code reuse from data scientists to
production workloads
17 TUMRA - Big Data Week, May 2014
WANT TO
KNOW MORE?
18 TUMRA - Big Data Week, May 2014
http://spark.apache.org
19 TUMRA - Big Data Week, May 2014
Spark Summit 2014
20 TUMRA - Big Data Week, May 2014
Spark London Meetup
21 TUMRA - Big Data Week, May 2014
Commercial Support & Certification
22 TUMRA - Big Data Week, May 2014
THANK
YOU

@tumra
tumra.com

slideshare.net/tumra

More Related Content

What's hot (19)

PDF
Open Source DataViz with Apache Superset
Carl W. Handlin
 
PPTX
Hadoop world overview trends and topics
Valentin Kropov
 
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
PPTX
963
Annu Ahmed
 
PDF
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
PDF
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
PDF
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
confluent
 
PDF
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
 
PPTX
Hunk - Unlocking the Power of Big Data
Splunk
 
PDF
Clickstream & Social Media Analysis using Apache Spark
TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science
 
PPTX
Qubole - Big data in cloud
Dmitry Tolpeko
 
PPTX
Atlanta Data Science Meetup | Qubole slides
Qubole
 
PPTX
Presentation Brucon - Anubisnetworks and PTCoresec
Tiago Henriques
 
PPTX
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
PDF
Treasure Data From MySQL to Redshift
Treasure Data, Inc.
 
PPTX
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
David Chen
 
PPTX
December 2013 HUG: Hunk - Splunk over Hadoop
Yahoo Developer Network
 
PPTX
Hunk - Unlocking The Power of Big Data Breakout Session
Splunk
 
PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
Open Source DataViz with Apache Superset
Carl W. Handlin
 
Hadoop world overview trends and topics
Valentin Kropov
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
confluent
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
 
Hunk - Unlocking the Power of Big Data
Splunk
 
Qubole - Big data in cloud
Dmitry Tolpeko
 
Atlanta Data Science Meetup | Qubole slides
Qubole
 
Presentation Brucon - Anubisnetworks and PTCoresec
Tiago Henriques
 
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Treasure Data From MySQL to Redshift
Treasure Data, Inc.
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
David Chen
 
December 2013 HUG: Hunk - Splunk over Hadoop
Yahoo Developer Network
 
Hunk - Unlocking The Power of Big Data Breakout Session
Splunk
 
Building a Big Data Pipeline
Jesus Rodriguez
 

Viewers also liked (19)

PPTX
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
MongoDB
 
PDF
Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYC
MLconf
 
PPTX
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
MongoDB
 
PDF
Cassandra UDF and Materialized Views
Duyhai Doan
 
PDF
20140908 spark sql & catalyst
Takuya UESHIN
 
PPTX
11 Shocking Stats That Will Transform Your Marketing Strategy
Sailthru
 
PPTX
Acquire, Grow & Retain Customers, Fast
Sailthru
 
PDF
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Spark Summit
 
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
PDF
The Best of the Best: Media and Publishing Newsletter Edition
Sailthru
 
PDF
2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and Why
Sailthru
 
PDF
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
PDF
Overview - IBM Big Data Platform
Vikas Manoria
 
PDF
Big Data & Analytics Architecture
Arvind Sathi
 
PDF
50 Facts That Will Make Businesses Rethink their Customer Service
Desk
 
PPTX
Introduction to Machine Learning
Lior Rokach
 
PPTX
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
PDF
Cours de Génie Logiciel / ESIEA 2016-17
Thierry Leriche-Dessirier
 
PDF
Management en couleur avec DISC
Thierry Leriche-Dessirier
 
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
MongoDB
 
Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYC
MLconf
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
MongoDB
 
Cassandra UDF and Materialized Views
Duyhai Doan
 
20140908 spark sql & catalyst
Takuya UESHIN
 
11 Shocking Stats That Will Transform Your Marketing Strategy
Sailthru
 
Acquire, Grow & Retain Customers, Fast
Sailthru
 
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Spark Summit
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
The Best of the Best: Media and Publishing Newsletter Edition
Sailthru
 
2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and Why
Sailthru
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
 
Overview - IBM Big Data Platform
Vikas Manoria
 
Big Data & Analytics Architecture
Arvind Sathi
 
50 Facts That Will Make Businesses Rethink their Customer Service
Desk
 
Introduction to Machine Learning
Lior Rokach
 
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Cours de Génie Logiciel / ESIEA 2016-17
Thierry Leriche-Dessirier
 
Management en couleur avec DISC
Thierry Leriche-Dessirier
 
Ad

Similar to What's next for Big Data? -- Apache Spark (20)

PDF
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
PPTX
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
 
PPTX
Atlanta MLConf
Qubole
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PDF
Big Data Testing Using Hadoop Platform
IRJET Journal
 
PPSX
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
PDF
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
PDF
Big data with java
Stefan Angelov
 
PDF
Big data analytics with Apache Hadoop
Suman Saurabh
 
PDF
Introduction to Big Data
Roi Blanco
 
PDF
New directions for Apache Spark in 2015
Databricks
 
PDF
Apache Spark Briefing
Thomas W. Dinsmore
 
PDF
Sv big datascience_cliffclick_5_2_2013
Sri Ambati
 
PPTX
Big data overview
beCloudReady
 
PDF
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
PPTX
Big Data Processing with Apache Spark 2014
mahchiev
 
PDF
Let's make money from big data!
B Spot
 
PDF
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
PPTX
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
 
Atlanta MLConf
Qubole
 
Started with-apache-spark
Happiest Minds Technologies
 
Big Data Testing Using Hadoop Platform
IRJET Journal
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Osama Khan
 
Big data with java
Stefan Angelov
 
Big data analytics with Apache Hadoop
Suman Saurabh
 
Introduction to Big Data
Roi Blanco
 
New directions for Apache Spark in 2015
Databricks
 
Apache Spark Briefing
Thomas W. Dinsmore
 
Sv big datascience_cliffclick_5_2_2013
Sri Ambati
 
Big data overview
beCloudReady
 
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Big Data Processing with Apache Spark 2014
mahchiev
 
Let's make money from big data!
B Spot
 
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
Ad

Recently uploaded (20)

PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 

What's next for Big Data? -- Apache Spark

  • 1. WHAT’S NEXT FOR BIG DATA? APACHE SPARK
  • 3. 3 TUMRA - Big Data Week, May 2014 Spark is … “One platform to rule them all” … and blurs boundary between SQL, machine learning, streams & graphs
  • 4. 4 TUMRA - Big Data Week, May 2014 Spark is … … gaining momentum
  • 5. 5 TUMRA - Big Data Week, May 2014 Spark has … … more contributors than Hadoop
  • 6. 6 TUMRA - Big Data Week, May 2014 Spark can … Source:  Databricks  
  • 7. 7 TUMRA - Big Data Week, May 2014 Spark Stack Source:  Databricks   Hadoop  (HDFS)  
  • 8. 8 TUMRA - Big Data Week, May 2014 Why Spark? -  Code reuse across batch, streaming and interactive applications -  Easy API from Scala, Java & Python -  In-memory data sharing FAAAAAAST!!! Check out http://spark.apache.org
  • 9. 9 TUMRA - Big Data Week, May 2014 CASE STUDY: PERSONALISATION & MARKETING AUTOMATION
  • 10. 10 TUMRA - Big Data Week, May 2014 Our history with Spark -  Early adopters; poc in Dec ‘12 -  In production since March ‘13 -  Running on Amazon EC2 -  Ad-hoc analysis and reporting -  Machine learning model building -  Integrates to our real-time dashboards
  • 11. 11 TUMRA - Big Data Week, May 2014 Use Case: Personalisation
  • 12. 12 TUMRA - Big Data Week, May 2014 Use Case: Personalisation (cont’d) -  Matching visitors to products -  50% of visitors are ‘new’ and have no history to work with -  Blend of pre-computation and real- time recommendations
  • 13. 13 TUMRA - Big Data Week, May 2014 Use Case: Marketing Automation -  Collect user engagement data across websites and mobile apps -  Increase subscription rates -  Identity users at risk of churn -  Automated personalised marketing
  • 14. 14 TUMRA - Big Data Week, May 2014 Data Volumes & Velocity -  29M events per day -  Peak rates ~800 events / second -  All events streamed to Kafka -  10B archived events in Amazon S3
  • 15. 15 TUMRA - Big Data Week, May 2014 How we use Spark Amazon  S3  (HDFS  interface)   Apache  Ka>a   Data  CollecAon  API  (Akka)  &  Connectors  
  • 16. 16 TUMRA - Big Data Week, May 2014 Spark gives us … -  Unified platform for machine learning and graph analytics -  Ability to experiment at huge scale -  SQL interfaces to existing tools -  Code reuse from data scientists to production workloads
  • 17. 17 TUMRA - Big Data Week, May 2014 WANT TO KNOW MORE?
  • 18. 18 TUMRA - Big Data Week, May 2014 http://spark.apache.org
  • 19. 19 TUMRA - Big Data Week, May 2014 Spark Summit 2014
  • 20. 20 TUMRA - Big Data Week, May 2014 Spark London Meetup
  • 21. 21 TUMRA - Big Data Week, May 2014 Commercial Support & Certification
  • 22. 22 TUMRA - Big Data Week, May 2014 THANK YOU @tumra tumra.com slideshare.net/tumra