SlideShare a Scribd company logo
Beyond	Big	Data:	
Data	Science	and	AI
Scott	Gnau
Chief	Technology	Officer
Hortonworks
The	New	Way	of	Business	Is	Fueled	by	Data
• Connected	customers,	
vehicles,	devices
• Socially	crowd-sourced	
requirements
• Digital	design	and	
analysis
• Digital	prototypes	and	
tests	
• Connected	factories,	
sensors,	devices
• Human-robotic	
interaction
• 3D-printing	on	
demand
• Connected	trucks,	
inventory
• Location,	traffic,	
weather-aware	
distribution
• Real-time	inventory	
visibility
• Dynamic	rerouting
• Connected	customers,	
devices
• Omni- channel	
demand	sensing
• Real-time	
recommendations
• Connected	assets
• Remote	service	
monitoring	&	delivery
• Predictive	maintenance
• OTA	updates
MANUFACTURING DISTRIBUTION MARKETING/SALES SERVICEDEVELOPMENT
The	Big	Data	Tech	Journey
2011
DATA-AT-REST	
HADOOP	1.0
100%	Open
2015
DATA-IN-MOTION
Out	to	the	edge
2016
CONNECT	DATA	PLATFORMS
Cloud/On	prem
Today
DRIVE	DATA	SCIENCE	SUCCESS
Intelligence	across	the	data	lifecycle
2013
YARN
Enable	multiple	
workloads
INTERNET
OF	THINGS
DATA	SCIENCE/
MACHINE	LEARNING
CLOUD	
COMPUTING
STREAMING
DATA
~$380B $210B~$1300B ~$19B
Sources:	Public	Cloud	Services	Market	size,	$383B	by	2020,	Gartner	2017	WW	Public	Cloud	Services	market.	Big	Data	&	Business	Analytics	revenues	forecast	to	be	$210B	by	2020,	IDC	2017.	IoT Spending	forecast	to	be		~1.31T	by	2020,	IDC	2017	
Worldwide	IoT Spending	Guide.	
AI	intelligence	market	size	to	reach	$19,478	million	by	2022,	growing	at	a	CAGR	of	45.4%	from	2016	to	2022,	Allied	Market	Research.
The	Perfect	Storm:	Tech	Trends	Fueling	Business
Today’s	Reality:
Encompass	and	Connect	All Data
SENSORS
EDGE	DEVICES
TELEMETRY
CONTROL	SYSTEMS
ENTERPRISE	DATA	LAKES
SECURITY	DATA	LAKES
DATALAKES	ON-PREM
DATA	LAKES	IN	THE	CLOUD
DATA	AT	RESTDATA	IN	MOTION ACTIONABLE
INTELLIGENCE
C L O U D
ON-PREMISES
Exception-Based	
Monitoring
360	View	of	
Operations,	
Equipment	
Failure	Analytics,	
etc.
Deep	Historical
Analysis
DATA	C E NT E R
Stream	Analytics
Cyber	
Security	&	
Threat	
Detection	
Telemetry	–
Connected	
Devices
Machine
Learning
C LO UD
Sensors,	
SCADA,		
Control	
Systems	
Edge	
Analytics
Time	Series	
Historian	
Modern	Data	Architecture
Data	Science	Success	Criteria
ENABLE	MORE	PEOPLE	TO	
PARTICIPATE	IN	DATA	SCIENCE
MAKE	DATA	SCIENTISTS	MORE
PRODUCTIVE	AND	COLLABORATIVE
MASSIVE	AMOUNTS	OF	DATA THE	RIGHT	TOOLSET FLEXIBLE	COMPUTE	
POWER
Powering	the	Modern	Data	Architecture
DATA	AT	RESTDATA	IN	MOTION
ACTIONABLE
INTELLIGENCE
COMPLETE		DATA	
LIFECYCLE	
MANAGEMENT
RUN	CONTAINERIZED	
APPLICATIONS
CONCURRENTLY	
EDGECLOUD
H O L I S T I C 	 M A N A G E M E N T, 	 G O V E R N A N C E 	 A N D 	 S E C U R I T Y
ON-PREMISES
MULTI-WORKLOADS																										MULTI-TYPE																																		MULTI-TIER
Find	more	#DWS17	sessions	and	slides	at:	
www.DataWorksSummit.com
10
T H A N K 	 Y O U

More Related Content

What's hot (20)

PPTX
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
 
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PDF
Democratizing Data Science on Kubernetes
John Archer
 
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit
 
PDF
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
PPTX
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
PPTX
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
PPTX
Practical advice to build a data driven company
DataWorks Summit/Hadoop Summit
 
PPTX
Making Bank Predictive and Real-Time
DataWorks Summit
 
PPTX
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
PDF
Smart data for a predictive bank
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Application Architectures - IoT
DataWorks Summit/Hadoop Summit
 
PPTX
Use dependency injection to get Hadoop *out* of your application code
DataWorks Summit
 
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 
PDF
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
 
PPTX
Hadoop dev 01
Vivian S. Zhang
 
PPTX
Oil and gas big data edition
Mark Kerzner
 
PDF
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PPTX
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Democratizing Data Science on Kubernetes
John Archer
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit
 
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
Practical advice to build a data driven company
DataWorks Summit/Hadoop Summit
 
Making Bank Predictive and Real-Time
DataWorks Summit
 
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
Smart data for a predictive bank
DataWorks Summit/Hadoop Summit
 
Big Data Application Architectures - IoT
DataWorks Summit/Hadoop Summit
 
Use dependency injection to get Hadoop *out* of your application code
DataWorks Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Hortonworks
 
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
 
Hadoop dev 01
Vivian S. Zhang
 
Oil and gas big data edition
Mark Kerzner
 
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 

Viewers also liked (18)

PPTX
Performance Update: When Apache ORC Met Apache Spark
DataWorks Summit
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
PDF
Apache Hadoop Crash Course
DataWorks Summit
 
PDF
Next Generation Execution for Apache Storm
DataWorks Summit
 
PDF
The Apache Way
DataWorks Summit
 
PDF
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
DataWorks Summit
 
PDF
Data Guarantees and Fault Tolerance in Streaming Systems
DataWorks Summit
 
PDF
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit
 
PDF
Data-In-Motion Unleashed
DataWorks Summit
 
PDF
Data Science Crash Course
DataWorks Summit
 
PDF
How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...
DataWorks Summit
 
PDF
SparkR Best Practices for R Data Scientists
DataWorks Summit
 
PDF
Delivering Data Science to the Business
DataWorks Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit
 
PDF
The Future of Data in Telecom and the Rise of Connected Communities
DataWorks Summit
 
PDF
Running Zeppelin in Enterprise
DataWorks Summit
 
PDF
An Apache Hive Based Data Warehouse
DataWorks Summit
 
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Performance Update: When Apache ORC Met Apache Spark
DataWorks Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
Apache Hadoop Crash Course
DataWorks Summit
 
Next Generation Execution for Apache Storm
DataWorks Summit
 
The Apache Way
DataWorks Summit
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
DataWorks Summit
 
Data Guarantees and Fault Tolerance in Streaming Systems
DataWorks Summit
 
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit
 
Data-In-Motion Unleashed
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
How Big Data and Deep Learning are Revolutionizing AML and Financial Crime De...
DataWorks Summit
 
SparkR Best Practices for R Data Scientists
DataWorks Summit
 
Delivering Data Science to the Business
DataWorks Summit
 
Apache Spark Crash Course
DataWorks Summit
 
The Future of Data in Telecom and the Rise of Connected Communities
DataWorks Summit
 
Running Zeppelin in Enterprise
DataWorks Summit
 
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Ad

Similar to Beyond Big Data: Data Science and AI (20)

PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PDF
Driving Digital Transformation through Global Data Management by Hortonworks
Dinesh Chandrasekhar
 
PDF
The New Age of the Industrial Internet
Alicia Nagel Creative, LLC
 
PDF
Data Analytics for IoT
Muralidhar Somisetty
 
PPTX
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
PDF
Reinvent Your Data Management Strategy for Successful Digital Transformation
Denodo
 
PDF
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo
 
PDF
Data Analytics for IoT - BrightTalk Webinar
Muralidhar Somisetty
 
PPTX
bigdataintro.pptx
Albert Alex
 
PDF
Digital Transformation and Data Science
Matthew W. Bowers
 
PDF
Dell AI Telecom Webinar
Bill Wong
 
PPTX
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
PDF
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Jan Wiegelmann
 
PDF
The 10 best performing big data and business analytics companies 2020
Merry D'souza
 
PPT
Big data.ppt
IdontKnow66967
 
PPTX
Lecture1
Manish Singh
 
PDF
Hortonworks - IBM Cognitive - The Future of Data Science
Thiago Santiago
 
PDF
Primend Ärikonverents - Keynote: Surviving, Differentiating and Dominating on...
Primend
 
PDF
Earley Executive Roundtable on Data Analytics - Session 2 - Mining Business I...
Earley Information Science
 
PPTX
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
Driving Digital Transformation through Global Data Management by Hortonworks
Dinesh Chandrasekhar
 
The New Age of the Industrial Internet
Alicia Nagel Creative, LLC
 
Data Analytics for IoT
Muralidhar Somisetty
 
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Denodo
 
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo
 
Data Analytics for IoT - BrightTalk Webinar
Muralidhar Somisetty
 
bigdataintro.pptx
Albert Alex
 
Digital Transformation and Data Science
Matthew W. Bowers
 
Dell AI Telecom Webinar
Bill Wong
 
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Jan Wiegelmann
 
The 10 best performing big data and business analytics companies 2020
Merry D'souza
 
Big data.ppt
IdontKnow66967
 
Lecture1
Manish Singh
 
Hortonworks - IBM Cognitive - The Future of Data Science
Thiago Santiago
 
Primend Ärikonverents - Keynote: Surviving, Differentiating and Dominating on...
Primend
 
Earley Executive Roundtable on Data Analytics - Session 2 - Mining Business I...
Earley Information Science
 
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Python basic programing language for automation
DanialHabibi2
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 

Beyond Big Data: Data Science and AI

Editor's Notes

  • #2: The future of the enterprise is becoming clearer as organizations begin to realize the strategic value of data. Today I want to walk you through the drivers and look at what the high level architecture will look like as enterprises realize that value.
  • #3: * Business across all industries are undergoing a digital transformation of massive scale. * Establishing a world where they are connecting everything to everything else. People, devices, vehicles
  • #4: We started our journey by making Hadoop ready for the enterprise. Established a data platform for structured data AND the new paradigm data from streams and social platforms open community open ecosystem. Multi tenancy and integrated security and governance Data in motion: manage data through its entire lifecycle from inception to where it lands at rest. With security, governance, lineage across that entire journey. On Prem and cloud Now serving ML, DL and AI
  • #5: Over the course of the last 5 years the 4 mega trends have driven and even accelerated the need to transform into a modern data architecture. * These trends are powering and enabling these transformations * Driving with it tremendous rewards for winners and losers in each industry
  • #6: Hortonworks open and Connected Platforms enable this transformation and are the core of the modern data architecture. Real time decisions on data in motion and data at rest—to the edge.
  • #7: So, what’s really happening? There is a an entire new world being created by combining lots of data with break through tools. Data could be on-premises and in the cloud Data is moving from sensors in real time across our data fabric and giving us precise instrumentation of what happened just before an event as well as after the event. This is true for customers buying on the web as well as products that might fail. We can run our machine learning and deep learning on these vast repositories of data And we can push these models down to the edges to automate decision
  • #8: • Data Science is the next key driver to transformational business execution • Companies need a strategic approach to turn data into value and create a competitive advantage We need three things today to succeed: • Lots of data makes the models very accurate • Lots of compute makes the models run fast • Data science as a team sport, new tools enable collaboration and make the models easy to deploy.
  • #9: We are not done innovating – the journey will continue We have solved the need for Hadoop to become enterprise ready We saw the need to manage data from inception to rest with our data in motion platform And we saw the need for driving the consumability of these platforms via the cloud Like we did for Hadoop, we are now working on making our the modern Data Architecture enterprise ready and usable