What's next for Big Data? -- Apache Spark

22 likes4,042 views

TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

The document discusses Apache Spark as a leading platform for big data, emphasizing its capabilities in integrating SQL, machine learning, and streaming analytics. It highlights the advantages of Spark such as code reuse, fast in-memory data sharing, and extensive use cases in personalization and marketing automation. Additionally, it presents usage statistics and the transition of Spark in production since 2013, showcasing its effectiveness in handling large data volumes.

Data & Analytics Technology Education

3 TUMRA - Big Data Week, May 2014

Spark is …
“One platform to rule them all”

… and blurs boundary between SQL,
machine learning, streams & graphs

4 TUMRA - Big Data Week, May 2014

Spark is …
… gaining momentum

5 TUMRA - Big Data Week, May 2014

Spark has …
… more contributors than Hadoop

6 TUMRA - Big Data Week, May 2014

Spark can …

Source:
Databricks

7 TUMRA - Big Data Week, May 2014

Spark Stack

Source:
Databricks

Hadoop
(HDFS)

8 TUMRA - Big Data Week, May 2014

Why Spark?
-  Code reuse across batch, streaming
and interactive applications
-  Easy API from Scala, Java & Python
-  In-memory data sharing
FAAAAAAST!!!
Check out http://spark.apache.org

9 TUMRA - Big Data Week, May 2014
CASE STUDY:
PERSONALISATION &
MARKETING
AUTOMATION

10 TUMRA - Big Data Week, May 2014
Our history with Spark
-  Early adopters; poc in Dec ‘12
-  In production since March ‘13
-  Running on Amazon EC2
-  Ad-hoc analysis and reporting
-  Machine learning model building
-  Integrates to our real-time dashboards

11 TUMRA - Big Data Week, May 2014
Use Case: Personalisation

12 TUMRA - Big Data Week, May 2014
Use Case: Personalisation (cont’d)
-  Matching visitors to products
-  50% of visitors are ‘new’ and have
no history to work with
-  Blend of pre-computation and real-
time recommendations

13 TUMRA - Big Data Week, May 2014
Use Case: Marketing Automation
-  Collect user engagement data
across websites and mobile apps
-  Increase subscription rates
-  Identity users at risk of churn
-  Automated personalised marketing

14 TUMRA - Big Data Week, May 2014
Data Volumes & Velocity
-  29M events per day
-  Peak rates ~800 events / second
-  All events streamed to Kafka
-  10B archived events in Amazon S3

15 TUMRA - Big Data Week, May 2014
How we use Spark

Amazon
S3
(HDFS
interface)
Apache
Ka>a

Data
CollecAon
API
(Akka)
&
Connectors

16 TUMRA - Big Data Week, May 2014
Spark gives us …
-  Unified platform for machine
learning and graph analytics
-  Ability to experiment at huge scale
-  SQL interfaces to existing tools
-  Code reuse from data scientists to
production workloads

17 TUMRA - Big Data Week, May 2014
WANT TO
KNOW MORE?

18 TUMRA - Big Data Week, May 2014
http://spark.apache.org

19 TUMRA - Big Data Week, May 2014
Spark Summit 2014

20 TUMRA - Big Data Week, May 2014
Spark London Meetup

21 TUMRA - Big Data Week, May 2014
Commercial Support & Certiﬁcation

22 TUMRA - Big Data Week, May 2014
THANK
YOU

@tumra
tumra.com

slideshare.net/tumra

More Related Content

What's hot (19)

PDF

Open Source DataViz with Apache SupersetCarl W. Handlin

PPTX

Hadoop world overview trends and topicsValentin Kropov

PDF

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks

PPTX

963Annu Ahmed

PDF

An efficient data mining solution by integrating Spark and CassandraStratio

PDF

Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan

PDF

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent

PDF

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit

PPTX

Hunk - Unlocking the Power of Big DataSplunk

PDF

Clickstream & Social Media Analysis using Apache SparkTUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

PPTX

Qubole - Big data in cloudDmitry Tolpeko

PPTX

Atlanta Data Science Meetup | Qubole slidesQubole

PPTX

Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques

PPTX

Building Data Pipelines with Spark and StreamSetsPat Patterson

PDF

Treasure Data From MySQL to RedshiftTreasure Data, Inc.

PPTX

Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen

PPTX

December 2013 HUG: Hunk - Splunk over HadoopYahoo Developer Network

PPTX

Hunk - Unlocking The Power of Big Data Breakout SessionSplunk

PPTX

Building a Big Data PipelineJesus Rodriguez

Open Source DataViz with Apache SupersetCarl W. Handlin

Hadoop world overview trends and topicsValentin Kropov

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks

963Annu Ahmed

An efficient data mining solution by integrating Spark and CassandraStratio

Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit

Hunk - Unlocking the Power of Big DataSplunk

Clickstream & Social Media Analysis using Apache SparkTUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Qubole - Big data in cloudDmitry Tolpeko

Atlanta Data Science Meetup | Qubole slidesQubole

Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques

Building Data Pipelines with Spark and StreamSetsPat Patterson

Treasure Data From MySQL to RedshiftTreasure Data, Inc.

Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen

December 2013 HUG: Hunk - Splunk over HadoopYahoo Developer Network

Hunk - Unlocking The Power of Big Data Breakout SessionSplunk

Building a Big Data PipelineJesus Rodriguez

Viewers also liked (19)

PPTX

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...MongoDB

PDF

Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYCMLconf

PPTX

Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB

PDF

Cassandra UDF and Materialized ViewsDuyhai Doan

PDF

20140908 spark sql & catalystTakuya UESHIN

PPTX

11 Shocking Stats That Will Transform Your Marketing Strategy Sailthru

PPTX

Acquire, Grow & Retain Customers, FastSailthru

PDF

Building a Recommendation Engine Using Diverse Features by Divyanshu VatsSpark Summit

PDF

Predictive Analytics and Machine Learning…with SAS and Apache HadoopHortonworks

PDF

The Best of the Best: Media and Publishing Newsletter EditionSailthru

PDF

2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and WhySailthru

PDF

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco

PDF

Overview - IBM Big Data PlatformVikas Manoria

PDF

Big Data & Analytics ArchitectureArvind Sathi

PDF

50 Facts That Will Make Businesses Rethink their Customer ServiceDesk

PPTX

Introduction to Machine LearningLior Rokach

PPTX

Introduction to Big Data/Machine LearningLars Marius Garshol

PDF

Cours de Génie Logiciel / ESIEA 2016-17Thierry Leriche-Dessirier

PDF

Management en couleur avec DISCThierry Leriche-Dessirier

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...MongoDB

Jeremy Stanley, EVP/Data Scientist, Sailthru at MLconf NYCMLconf

Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB

Cassandra UDF and Materialized ViewsDuyhai Doan

20140908 spark sql & catalystTakuya UESHIN

11 Shocking Stats That Will Transform Your Marketing Strategy Sailthru

Acquire, Grow & Retain Customers, FastSailthru

Building a Recommendation Engine Using Diverse Features by Divyanshu VatsSpark Summit

Predictive Analytics and Machine Learning…with SAS and Apache HadoopHortonworks

The Best of the Best: Media and Publishing Newsletter EditionSailthru

2017 Digital Retail Innovation: 9 Areas Retail Marketers are Investing and WhySailthru

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco

Overview - IBM Big Data PlatformVikas Manoria

Big Data & Analytics ArchitectureArvind Sathi

50 Facts That Will Make Businesses Rethink their Customer ServiceDesk

Introduction to Machine LearningLior Rokach

Introduction to Big Data/Machine LearningLars Marius Garshol

Cours de Génie Logiciel / ESIEA 2016-17Thierry Leriche-Dessirier

Management en couleur avec DISCThierry Leriche-Dessirier

Similar to What's next for Big Data? -- Apache Spark (20)

PDF

Liferay & Big Data Dev Con 2014Miguel Pastor

PPTX

How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri

PPTX

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf

PPTX

Atlanta MLConfQubole

PDF

Started with-apache-sparkHappiest Minds Technologies

PDF

Big Data Testing Using Hadoop PlatformIRJET Journal

PPSX

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

PDF

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesOsama Khan

PDF

Big data with javaStefan Angelov

PDF

Big data analytics with Apache HadoopSuman Saurabh

PDF

Introduction to Big DataRoi Blanco

PDF

New directions for Apache Spark in 2015Databricks

PDF

Apache Spark BriefingThomas W. Dinsmore

PDF

Sv big datascience_cliffclick_5_2_2013Sri Ambati

PPTX

Big data overviewbeCloudReady

PDF

Apache Spark 101 - Demi Ben-AriDemi Ben-Ari

PPTX

Big Data Processing with Apache Spark 2014mahchiev

PDF

Let's make money from big data! B Spot

PDF

Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan

PPTX

Chapter 4 : Introduction to BigData.pptxbharatgautam204

Liferay & Big Data Dev Con 2014Miguel Pastor

How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf

Atlanta MLConfQubole

Started with-apache-sparkHappiest Minds Technologies

Big Data Testing Using Hadoop PlatformIRJET Journal

How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences

Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesOsama Khan

Big data with javaStefan Angelov

Big data analytics with Apache HadoopSuman Saurabh

Introduction to Big DataRoi Blanco

New directions for Apache Spark in 2015Databricks

Apache Spark BriefingThomas W. Dinsmore

Sv big datascience_cliffclick_5_2_2013Sri Ambati

Big data overviewbeCloudReady

Apache Spark 101 - Demi Ben-AriDemi Ben-Ari

Big Data Processing with Apache Spark 2014mahchiev

Let's make money from big data! B Spot

Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan

Chapter 4 : Introduction to BigData.pptxbharatgautam204

Recently uploaded (20)

PPTX

Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptxlacsonjhoma0407

PPTX

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

PPTX

ER_Model_with_Diagrams_Presentation.pptxdharaadhvaryu1992

PPTX

apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...apidays

PDF

apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...apidays

PDF

OOPs with Java_unit2.pdf. sarthak bookkkSarthak964187

PDF

Data Retrieval and Preparation Business Analytics.pdfkayserrakib80

PDF

OPPOTUS - Malaysias on Malaysia 1Q2025.pdfOppotus

PPTX

apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...apidays

PPT

AI Future trends and opportunities_oct7v1.pptSHIKHAKMEHTA

PDF

What does good look like - CRAP Brighton 8 July 2025Jan Kierzyk

PPTX

apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...apidays

PDF

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

PDF

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

PPT

tuberculosiship-2106031cyyfuftufufufivifvivivAkshaiRam

PDF

apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)apidays

PPTX

Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...Sease

PDF

apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)apidays

PPTX

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

PDF

Research Methodology Overview Introductionayeshagul29594

Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptxlacsonjhoma0407

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

ER_Model_with_Diagrams_Presentation.pptxdharaadhvaryu1992

apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...apidays

apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...apidays

OOPs with Java_unit2.pdf. sarthak bookkkSarthak964187

Data Retrieval and Preparation Business Analytics.pdfkayserrakib80

OPPOTUS - Malaysias on Malaysia 1Q2025.pdfOppotus

apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...apidays

AI Future trends and opportunities_oct7v1.pptSHIKHAKMEHTA

What does good look like - CRAP Brighton 8 July 2025Jan Kierzyk

apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...apidays

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

tuberculosiship-2106031cyyfuftufufufivifvivivAkshaiRam

apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)apidays

Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...Sease

apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)apidays

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

Research Methodology Overview Introductionayeshagul29594

What's next for Big Data? -- Apache Spark

1. WHAT’S NEXT FOR BIG DATA? APACHE SPARK

2. WTH IS SPARK?

3. 3 TUMRA - Big Data Week, May 2014 Spark is … “One platform to rule them all” … and blurs boundary between SQL, machine learning, streams & graphs

4. 4 TUMRA - Big Data Week, May 2014 Spark is … … gaining momentum

5. 5 TUMRA - Big Data Week, May 2014 Spark has … … more contributors than Hadoop

6. 6 TUMRA - Big Data Week, May 2014 Spark can … Source: Databricks

7. 7 TUMRA - Big Data Week, May 2014 Spark Stack Source: Databricks Hadoop (HDFS)

8. 8 TUMRA - Big Data Week, May 2014 Why Spark? -  Code reuse across batch, streaming and interactive applications -  Easy API from Scala, Java & Python -  In-memory data sharing FAAAAAAST!!! Check out http://spark.apache.org

9. 9 TUMRA - Big Data Week, May 2014 CASE STUDY: PERSONALISATION & MARKETING AUTOMATION

10. 10 TUMRA - Big Data Week, May 2014 Our history with Spark -  Early adopters; poc in Dec ‘12 -  In production since March ‘13 -  Running on Amazon EC2 -  Ad-hoc analysis and reporting -  Machine learning model building -  Integrates to our real-time dashboards

11. 11 TUMRA - Big Data Week, May 2014 Use Case: Personalisation

12. 12 TUMRA - Big Data Week, May 2014 Use Case: Personalisation (cont’d) -  Matching visitors to products -  50% of visitors are ‘new’ and have no history to work with -  Blend of pre-computation and real- time recommendations

13. 13 TUMRA - Big Data Week, May 2014 Use Case: Marketing Automation -  Collect user engagement data across websites and mobile apps -  Increase subscription rates -  Identity users at risk of churn -  Automated personalised marketing

14. 14 TUMRA - Big Data Week, May 2014 Data Volumes & Velocity -  29M events per day -  Peak rates ~800 events / second -  All events streamed to Kafka -  10B archived events in Amazon S3

15. 15 TUMRA - Big Data Week, May 2014 How we use Spark Amazon S3 (HDFS interface) Apache Ka>a Data CollecAon API (Akka) & Connectors

16. 16 TUMRA - Big Data Week, May 2014 Spark gives us … -  Unified platform for machine learning and graph analytics -  Ability to experiment at huge scale -  SQL interfaces to existing tools -  Code reuse from data scientists to production workloads

17. 17 TUMRA - Big Data Week, May 2014 WANT TO KNOW MORE?

18. 18 TUMRA - Big Data Week, May 2014 http://spark.apache.org

19. 19 TUMRA - Big Data Week, May 2014 Spark Summit 2014

20. 20 TUMRA - Big Data Week, May 2014 Spark London Meetup

21. 21 TUMRA - Big Data Week, May 2014 Commercial Support & Certiﬁcation

22. 22 TUMRA - Big Data Week, May 2014 THANK YOU @tumra tumra.com slideshare.net/tumra