SlideShare a Scribd company logo
Virtualizing Analytics
with Apache Spark
Arsalan Tavakoli-Shiraji
Spark Summit East 2017
Enterprise aspirations:
More data, more intelligence
So what’s the formula for
success?
ANALYTICS
PEOPL
E
DATA
3 pillars of any data-driven use case
Data: Bigger, messier, more spread
out
• Spread out into silos
• Varying types and structure
• Faster velocity
ANALYTICS
PEOPL
E
DATA
Analytics: More variety and
complexity
• Multiple approaches
• Iterative discovery
• Difficult to productionize
ANALYTICS
People: Collaboration from start to
finish
PEOPLE
• Many roles involved
• Diverse skillsets and goals
• Inefficient hand-offs
Is today’s technology
stack adequate?
ANALYTICS
PEOPL
E
DATA
First Generation: The Data
Warehouse
Reporting on small dataOnly structured data;
Costly to scale
Descriptive
analytics
Targeted at BI
ANALYTICS
PEOPL
E
DATA
Second Generation: Hadoop + Data
Lake
Capture data first, ETL later
Hard to centralize the data;
Limited value without ETL
Disparate and
complex tools
Limited to developers with big data expertise
PEOPL
E
DATA
VIRTUAL
ANALYTICS
Decoupled compute and storage
Uniform data management and
security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud
storage
Cloud
Storage
And many
others…
Hadoop Storage
ANALYTICS
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts
The New Paradigm
Is Apache Spark the Answer?
VIRTUAL
ANALYTICS
Decoupled compute and storage
Uniform data management and
security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud
storage
Cloud
Storage
And many
others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts
Databricks + Apache Spark
Databricks Enterprise
Security & Governance
Collaborative End User
Workspace
Production
Pipeline
Orchestration
Data Catalog
& Optimized
Data Access
Fully Managed Cloud Platform
Data Warehouses
DATA
Cloud
storage
Many others…
Cloud
Storage
And many
others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many
others…
BI Analysts
Case Study |
Video quality
Real-time anomaly
detection
Viewer loyalty
Grow the Viacom audience
The Road Ahead

More Related Content

What's hot (20)

PDF
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
PPTX
Spark Summit Keynote by Seshu Adunuthula
Spark Summit
 
PPTX
Spark Summit Keynote by Suren Nathan
Spark Summit
 
PPTX
Spark Summit East Keynote by Anjul Bhambhri
Jen Aman
 
PDF
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
PDF
Detecting Mobile Malware with Apache Spark with David Pryce
Databricks
 
PDF
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
Databricks
 
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
PPTX
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
PPTX
Spark Summit Keynote by Shaun Connolly
Spark Summit
 
PDF
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Databricks
 
PDF
Spark at Airbnb
Hao Wang
 
PDF
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 
PDF
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
PPTX
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
PDF
Bridging the Gap Between Datasets and DataFrames
Databricks
 
PDF
Spark Summit EU 2015: Matei Zaharia keynote
Databricks
 
PDF
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Databricks
 
PDF
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
PDF
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit
 
Spark Summit Keynote by Suren Nathan
Spark Summit
 
Spark Summit East Keynote by Anjul Bhambhri
Jen Aman
 
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
Detecting Mobile Malware with Apache Spark with David Pryce
Databricks
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
Databricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
Spark Summit Keynote by Shaun Connolly
Spark Summit
 
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Databricks
 
Spark at Airbnb
Hao Wang
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
Bridging the Gap Between Datasets and DataFrames
Databricks
 
Spark Summit EU 2015: Matei Zaharia keynote
Databricks
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 

Viewers also liked (20)

PDF
Making Structured Streaming Ready for Production
Databricks
 
PDF
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Spark Summit
 
PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Spark Summit
 
PDF
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
PPTX
Robust and Scalable ETL over Cloud Storage with Apache Spark
Databricks
 
PPTX
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
PPTX
Parallelizing Existing R Packages with SparkR
Databricks
 
PDF
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit
 
PDF
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Spark Summit
 
PDF
Exceptions are the Norm: Dealing with Bad Actors in ETL
Databricks
 
PDF
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
PPTX
Optimizing Apache Spark SQL Joins
Databricks
 
PDF
Spark Summit EU talk by Johnathan Mercer
Spark Summit
 
PDF
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 
PDF
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Spark Summit
 
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
PDF
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit
 
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Making Structured Streaming Ready for Production
Databricks
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Spark Summit
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Spark Summit
 
Insights Without Tradeoffs: Using Structured Streaming
Databricks
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Databricks
 
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
Parallelizing Existing R Packages with SparkR
Databricks
 
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit
 
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Spark Summit
 
Exceptions are the Norm: Dealing with Bad Actors in ETL
Databricks
 
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
Optimizing Apache Spark SQL Joins
Databricks
 
Spark Summit EU talk by Johnathan Mercer
Spark Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Spark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Spark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Ad

Similar to Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli (20)

PDF
Operationalizing Data Analytics
VMware Tanzu
 
PDF
WSO2Con USA 2017: Driving Insights for Your Digital Business With Analytics
WSO2
 
PDF
IBM_Analytics_eBook_07 15 16
Volkan Tekeli
 
PDF
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
 
PDF
Data virtualization
Praveen Reddy
 
PDF
BIg Data Trends in 2016
Stig-Arne Kristoffersen
 
PPTX
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Software India
 
PDF
The Future of Data Analytics_ Trends to Watch in 2025.pdf
AtliQ Technologies
 
PDF
2015 Trends in Data Intelligence
ClearStory Data
 
PPSX
De-Mystifying Big Data
Prasad Mavuduri
 
PDF
Apache Spark and future of advanced analytics
Muralidhar Somisetty
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
PPTX
Predictive Analytics: Extending asset management framework for multi-industry...
Capgemini
 
PDF
Big data analytic market opportunity
Stanley Wang
 
PDF
Capturing big value in big data
BSP Media Group
 
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
PDF
Future of Data - Big Data
Shankar R
 
PDF
Big Data at a Gaming Company: Spil Games
Rob Winters
 
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
PDF
Are you ready for Big Data 2.0? EMA Analyst Research
Enterprise Management Associates
 
Operationalizing Data Analytics
VMware Tanzu
 
WSO2Con USA 2017: Driving Insights for Your Digital Business With Analytics
WSO2
 
IBM_Analytics_eBook_07 15 16
Volkan Tekeli
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
 
Data virtualization
Praveen Reddy
 
BIg Data Trends in 2016
Stig-Arne Kristoffersen
 
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Software India
 
The Future of Data Analytics_ Trends to Watch in 2025.pdf
AtliQ Technologies
 
2015 Trends in Data Intelligence
ClearStory Data
 
De-Mystifying Big Data
Prasad Mavuduri
 
Apache Spark and future of advanced analytics
Muralidhar Somisetty
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Predictive Analytics: Extending asset management framework for multi-industry...
Capgemini
 
Big data analytic market opportunity
Stanley Wang
 
Capturing big value in big data
BSP Media Group
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Future of Data - Big Data
Shankar R
 
Big Data at a Gaming Company: Spil Games
Rob Winters
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
Are you ready for Big Data 2.0? EMA Analyst Research
Enterprise Management Associates
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

  • 1. Virtualizing Analytics with Apache Spark Arsalan Tavakoli-Shiraji Spark Summit East 2017
  • 3. So what’s the formula for success?
  • 4. ANALYTICS PEOPL E DATA 3 pillars of any data-driven use case
  • 5. Data: Bigger, messier, more spread out • Spread out into silos • Varying types and structure • Faster velocity ANALYTICS PEOPL E DATA
  • 6. Analytics: More variety and complexity • Multiple approaches • Iterative discovery • Difficult to productionize ANALYTICS
  • 7. People: Collaboration from start to finish PEOPLE • Many roles involved • Diverse skillsets and goals • Inefficient hand-offs
  • 9. ANALYTICS PEOPL E DATA First Generation: The Data Warehouse Reporting on small dataOnly structured data; Costly to scale Descriptive analytics Targeted at BI
  • 10. ANALYTICS PEOPL E DATA Second Generation: Hadoop + Data Lake Capture data first, ETL later Hard to centralize the data; Limited value without ETL Disparate and complex tools Limited to developers with big data expertise
  • 11. PEOPL E DATA VIRTUAL ANALYTICS Decoupled compute and storage Uniform data management and security model Unified analytics engine Enterprise-wide collaboration Data Warehouses DATA Cloud storage Cloud Storage And many others… Hadoop Storage ANALYTICS PEOPLE Data Science Data Engineering And many others… BI Analysts The New Paradigm
  • 12. Is Apache Spark the Answer? VIRTUAL ANALYTICS Decoupled compute and storage Uniform data management and security model Unified analytics engine Enterprise-wide collaboration Data Warehouses DATA Cloud storage Cloud Storage And many others… Hadoop Storage PEOPLE Data Science Data Engineering And many others… BI Analysts
  • 13. Databricks + Apache Spark Databricks Enterprise Security & Governance Collaborative End User Workspace Production Pipeline Orchestration Data Catalog & Optimized Data Access Fully Managed Cloud Platform Data Warehouses DATA Cloud storage Many others… Cloud Storage And many others… Hadoop Storage PEOPLE Data Science Data Engineering And many others… BI Analysts
  • 14. Case Study | Video quality Real-time anomaly detection Viewer loyalty Grow the Viacom audience

Editor's Notes

  • #3: In every industry sector I’ve encountered, the interest in big data is stronger than ever. Why are they so interested? They believe data is the key to transforming their businesses. You’ve already heard of some of these examples; Yesterday, Salesforce came on stage and talked about their plan to build their next-generation CRM product with AI – what they call Einstein. And they are using Spark. Today, we will hear from the likes of HP – a pedigreed company built on manufacturing devices, and who is using Spark to create a service-based business model with IoT data Or another familiar name – McGraw Hill – who has been creating education material for decades but is now looking to Spark to revolutionize learning. They want to use behavior data from students to identify gaps in understanding and provide personalized learning approaches to achieve better outcomes. Many of the companies we talk to aspire to leverage greater intelligence with data throughout their business, but unfortunately this is much more difficult than it seems.
  • #6: The first observation is about the catalyst – the data, Everyone knows that data is bigger and more diverse, but what people underestimate is just how inaccessible and siloed they are. The reason that the volume and the variety of the data is growing so fast, is because now you have many more ways to generate data – it’s gone beyond just web servers or enterprise resource planning systems. Today, it’s the electronic medical records at your doctor’s office, connected sensors embedded in transformers in an electrical substation, or even more outrageously – a fusion of medical records and connected sensors in the form of fitness trackers that you wear every minute of the day. And in every instance, new data stores are being instantiated in all corners of the business faster than you can ever imagine. So yes, storage is a problem, but that’s not even _the_ problem The real problem at the enterprise level is how to catalog, organize, secure, govern this complex federation of data.
  • #7: Next, let’s talk about AI AI is a loose collection of many different algorithms that allows machines to make predictions, or make decisions. It’s a game-changer once developed, it can automate complex tasks, or aid human decision making. There are many varieties of algorithms at our disposal today, and more are being developed constantly. The challenge to building great AI – in addition to having the right data of course, is to pick the right algorithm for the problem How would you know what’s the right algorithm? Well, it’s hard to say, you may have to try a few different approaches. Certainly when you have many use cases, there is unlikely going to be a single approach that can use used everywhere. This means that problem is not just getting one algorithm to work, but to have a way of applying many different types of algorithms depending on the context.
  • #8: Finally, let’s talk about all functional roles that’s involved in making every use case successful. This is probably the most often over-looked element in this whole equation. In every enterprise data use case, many different teams must work together seamlessly to be successful. What I mean by work together is that: You first need to business context – someone who has the domain knowledge You then need the experts who can bring the data together – the data integration, cleansing, all in a reliable and timely way You need people who can systematically use the data to derive answers, or use algorithms to build models that derive the answers These different roles exist because today’s enterprises, and their business models are so vast and complex, not a single team can do all these jobs. The data engineering team need to
  • #10: Typically people start with the data warehouse. It was created to solve a very narrow and specific problem: When data is very structured and you give business analysts a way to use data for decision making. It has many limitations: First, it does not scale up to big data - only a small percentage of enterprise data used in decision-making Second, the data warehouse does not offer a way to build AI, so there is no way to automate decision-making. Business still have to rely on a handful of business analysts to manually sift through the data, build dashboards or create reports to support the business.
  • #11: Typically people start with the data warehouse. It was created to solve a very narrow and specific problem: When data is very structured and you give business analysts a way to use data for decision making. It has many limitations: First, it does not scale up to big data - only a small percentage of enterprise data used in decision-making Second, the data warehouse does not offer a way to build AI, so there is no way to automate decision-making. Business still have to rely on a handful of business analysts to manually sift through the data, build dashboards or create reports to support the business.
  • #12: Instead of centralizing data and building a complex zoo of tools on top of single storage system, there is another approach Separate compute and storage The new approach uses a flexible compute layer to: Connect to different data stores without migrating data, manage metadata across silos Run diverse workloads to support a wide range of analytics approaches Provide simplified interfaces for users with different skillsets and objectives Effectively, we want to virtualize the analytics layer
  • #15: Viacom is the parent company of MTV and Nickelodeon. It is one of the largest media companies in the world, its content is broadcasted in more than 160 countries. Delivering high-quality video and growing the engagement of the viewers is the core mission of Viacom.