Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Download as PPTX, PDF

1 like389 views

The document discusses the challenges and evolution of data analytics, emphasizing the need for better data management and analytics tools as data becomes larger and more complex. It highlights the transition from traditional data warehouses to modern solutions like Apache Spark and Databricks, which offer improved collaboration and efficiency in data processing. A case study on real-time anomaly detection is mentioned to illustrate the practical benefits of these advancements in analytics.

Data & Analytics

Virtualizing Analytics
with Apache Spark
Arsalan Tavakoli-Shiraji
Spark Summit East 2017

Enterprise aspirations:
More data, more intelligence

ANALYTICS
PEOPL
E
DATA
3 pillars of any data-driven use case

Data: Bigger, messier, more spread
out
• Spread out into silos
• Varying types and structure
• Faster velocity
ANALYTICS
PEOPL
E
DATA

Analytics: More variety and
complexity
• Multiple approaches
• Iterative discovery
• Difficult to productionize
ANALYTICS

People: Collaboration from start to
finish
PEOPLE
• Many roles involved
• Diverse skillsets and goals
• Inefficient hand-offs

ANALYTICS
PEOPL
E
DATA
First Generation: The Data
Warehouse
Reporting on small dataOnly structured data;
Costly to scale
Descriptive
analytics
Targeted at BI

ANALYTICS
PEOPL
E
DATA
Second Generation: Hadoop + Data
Lake
Capture data first, ETL later
Hard to centralize the data;
Limited value without ETL
Disparate and
complex tools
Limited to developers with big data expertise

PEOPL
E
DATA
VIRTUAL
ANALYTICS
Decoupled compute and storage
Uniform data management and
security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud
storage
Cloud
Storage
And many
others…
Hadoop Storage
ANALYTICS
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts
The New Paradigm

Is Apache Spark the Answer?
VIRTUAL
ANALYTICS
Decoupled compute and storage
Uniform data management and
security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud
storage
Cloud
Storage
And many
others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts

Databricks + Apache Spark
Databricks Enterprise
Security & Governance
Collaborative End User
Workspace
Production
Pipeline
Orchestration
Data Catalog
& Optimized
Data Access
Fully Managed Cloud Platform
Data Warehouses
DATA
Cloud
storage
Many others…
Cloud
Storage
And many
others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts

More Related Content

What's hot (20)

PDF

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

PPTX

Spark Summit Keynote by Seshu AdunuthulaSpark Summit

PPTX

Spark Summit Keynote by Suren NathanSpark Summit

PPTX

Spark Summit East Keynote by Anjul BhambhriJen Aman

PDF

What to Expect for Big Data and Apache Spark in 2017 Databricks

PDF

Detecting Mobile Malware with Apache Spark with David PryceDatabricks

PDF

How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks

PDF

Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks

PPTX

Disrupting Big Data with Apache Spark in the CloudJen Aman

PPTX

Spark Summit Keynote by Shaun ConnollySpark Summit

PDF

Building an AI-Powered Retail Experience with Delta Lake, Spark, and DatabricksDatabricks

PDF

Spark at AirbnbHao Wang

PDF

An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks

PDF

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit

PPTX

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson

PDF

Bridging the Gap Between Datasets and DataFramesDatabricks

PDF

Spark Summit EU 2015: Matei Zaharia keynoteDatabricks

PDF

Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks

PDF

Building Robust Production Data Pipelines with Databricks DeltaDatabricks

PDF

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

Spark Summit Keynote by Seshu AdunuthulaSpark Summit

Spark Summit Keynote by Suren NathanSpark Summit

Spark Summit East Keynote by Anjul BhambhriJen Aman

What to Expect for Big Data and Apache Spark in 2017 Databricks

Detecting Mobile Malware with Apache Spark with David PryceDatabricks

How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks

Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks

Disrupting Big Data with Apache Spark in the CloudJen Aman

Spark Summit Keynote by Shaun ConnollySpark Summit

Building an AI-Powered Retail Experience with Delta Lake, Spark, and DatabricksDatabricks

Spark at AirbnbHao Wang

An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks

The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson

Bridging the Gap Between Datasets and DataFramesDatabricks

Spark Summit EU 2015: Matei Zaharia keynoteDatabricks

Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks

Building Robust Production Data Pipelines with Databricks DeltaDatabricks

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy

Viewers also liked (20)

PDF

Making Structured Streaming Ready for ProductionDatabricks

PDF

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit

PDF

Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Spark Summit

PDF

Insights Without Tradeoffs: Using Structured StreamingDatabricks

PDF

Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit

PDF

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit

PPTX

Robust and Scalable ETL over Cloud Storage with Apache SparkDatabricks

PPTX

Keeping Spark on Track: Productionizing Spark for ETLDatabricks

PPTX

Parallelizing Existing R Packages with SparkRDatabricks

PDF

Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...Spark Summit

PDF

Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Spark Summit

PDF

Exceptions are the Norm: Dealing with Bad Actors in ETLDatabricks

PDF

SparkSQL: A Compiler from Queries to RDDsDatabricks

PPTX

Optimizing Apache Spark SQL JoinsDatabricks

PDF

Spark Summit EU talk by Johnathan MercerSpark Summit

PDF

Spark Summit EU talk by Emlyn WhittickSpark Summit

PDF

Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...Spark Summit

PDF

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

PDF

Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit

PDF

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Making Structured Streaming Ready for ProductionDatabricks

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit

Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Spark Summit

Insights Without Tradeoffs: Using Structured StreamingDatabricks

Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit

Robust and Scalable ETL over Cloud Storage with Apache SparkDatabricks

Keeping Spark on Track: Productionizing Spark for ETLDatabricks

Parallelizing Existing R Packages with SparkRDatabricks

Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...Spark Summit

Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Spark Summit

Exceptions are the Norm: Dealing with Bad Actors in ETLDatabricks

SparkSQL: A Compiler from Queries to RDDsDatabricks

Optimizing Apache Spark SQL JoinsDatabricks

Spark Summit EU talk by Johnathan MercerSpark Summit

Spark Summit EU talk by Emlyn WhittickSpark Summit

Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...Spark Summit

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Spark Summit EU talk by Yaroslav Nedashkovsky and Andy StarzhinskySpark Summit

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Similar to Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli (20)

PDF

Operationalizing Data AnalyticsVMware Tanzu

PDF

WSO2Con USA 2017: Driving Insights for Your Digital Business With AnalyticsWSO2

PDF

IBM_Analytics_eBook_07 15 16Volkan Tekeli

PDF

Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks

PDF

Data virtualizationPraveen Reddy

PDF

BIg Data Trends in 2016Stig-Arne Kristoffersen

PPTX

IBM Solutions Connect 2013 - Getting started with Big DataIBM Software India

PDF

The Future of Data Analytics_ Trends to Watch in 2025.pdfAtliQ Technologies

PDF

2015 Trends in Data Intelligence ClearStory Data

PPSX

De-Mystifying Big DataPrasad Mavuduri

PDF

Apache Spark and future of advanced analyticsMuralidhar Somisetty

PPTX

Accelerating Data Warehouse ModernizationDataWorks Summit/Hadoop Summit

PPTX

Predictive Analytics: Extending asset management framework for multi-industry...Capgemini

PDF

Big data analytic market opportunityStanley Wang

PDF

Capturing big value in big data BSP Media Group

PDF

20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge

PDF

Future of Data - Big DataShankar R

PDF

Big Data at a Gaming Company: Spil GamesRob Winters

PPTX

Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies

PDF

Are you ready for Big Data 2.0? EMA Analyst ResearchEnterprise Management Associates

Operationalizing Data AnalyticsVMware Tanzu

WSO2Con USA 2017: Driving Insights for Your Digital Business With AnalyticsWSO2

IBM_Analytics_eBook_07 15 16Volkan Tekeli

Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks

Data virtualizationPraveen Reddy

BIg Data Trends in 2016Stig-Arne Kristoffersen

IBM Solutions Connect 2013 - Getting started with Big DataIBM Software India

The Future of Data Analytics_ Trends to Watch in 2025.pdfAtliQ Technologies

2015 Trends in Data Intelligence ClearStory Data

De-Mystifying Big DataPrasad Mavuduri

Apache Spark and future of advanced analyticsMuralidhar Somisetty

Accelerating Data Warehouse ModernizationDataWorks Summit/Hadoop Summit

Predictive Analytics: Extending asset management framework for multi-industry...Capgemini

Big data analytic market opportunityStanley Wang

Capturing big value in big data BSP Media Group

20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge

Future of Data - Big DataShankar R

Big Data at a Gaming Company: Spil GamesRob Winters

Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies

Are you ready for Big Data 2.0? EMA Analyst ResearchEnterprise Management Associates

More from Spark Summit (20)

PDF

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

PDF

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

PDF

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

PDF

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

PDF

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

PDF

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

PDF

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

PDF

Powering a Startup with Apache Spark with Kevin KimSpark Summit

PDF

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

PDF

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

PDF

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

PDF

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

PDF

Goal Based Data Production with Sim SimeonovSpark Summit

PDF

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

PDF

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

PDF

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

PDF

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit