Big Data Storage and Analytics Q&A
Matthew Aslett, research director
2
Webinar Logistics
●  Be on the look-out for polling questions
●  You may ask questions at any time during the presentation by using the
Q&A box
●  ON-Demand Viewers please tweet us questions @cloudianstorage
●  At the end of the presentation please provide feedback and rate us
451 Research is an information
technology research & advisory company
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+
innovative technology & service providers
Headquartered in New York City with offices in London,
Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute
comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
3
Copyright (C) 2015 451 Research LLC
4
Our Speakers
4
Paul Turner leads marketing, product planning and strategy at Cloudian. A storage
industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office,
guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23
years of development and management leadership, including 15 years at Oracle.
Matt Aslet, Research Director for the data platforms and analytics research channel, has
overall responsibility for the coverage of operational and analytic databases, data
integration, data quality, and business intelligence. Matt's own primary area of focus is on
relational and non-relational databases - including NoSQL and NewSQL - data warehousing,
data caching, and Hadoop. Matthew is also an expert in open source software and regularly
contributes to 451 Research's open source-related research.
John Kreisa A veteran from the enterprise marketing industry, John has worked on products
at every level of the IT stack from the depths of storage through to the insight of business
intelligence and analytics. Currently John leads partner and strategic marketing initiatives at
open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
5
Copyright (C) 2015 451 Research LLC
CAUSE?
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
EFFECT
6
Copyright (C) 2015 451 Research LLC
CAUSE?
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
EFFECTEFFECTEDCAUSE
7
Copyright (C) 2015 451 Research LLC
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
Economics:
•  Commodity hardware
•  Open source software
EFFECTEFFECTEDCAUSE
8
Copyright (C) 2015 451 Research LLC
Big data is driven by economics
9
“Big	
  data	
  is	
  what	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  
became	
  less	
  than	
  the	
  cost	
  of	
  throwing	
  it	
  away.”	
  
	
  –	
  George	
  Dyson	
  
“Big	
  data:	
  New	
  business	
  insights	
  based	
  on	
  storing,	
  processing	
  and	
  
analyzing	
  data	
  that	
  was	
  previously	
  ignored	
  due	
  to	
  the	
  cost	
  and	
  
func5onal	
  limita5ons	
  of	
  tradi5onal	
  data	
  management	
  technologies.”	
  
	
  –	
  451	
  Research	
  	
  	
  
Copyright (C) 2015 451 Research LLC
Big data is driven by economics
10
Copyright (C) 2015 451 Research LLC
What	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  became	
  less	
  
than	
  the	
  cost	
  of	
  throwing	
  it	
  away?	
  
Big data is driven by economics
11
What	
  happened	
  when	
  the	
  cost	
  of	
  keeping	
  informa5on	
  became	
  less	
  
than	
  the	
  cost	
  of	
  throwing	
  it	
  away?	
  
•  The	
  processing	
  and	
  analysis	
  of	
  very	
  large	
  data	
  sets	
  in	
  their	
  en5rety	
  
•  Increased	
  adop5on	
  of	
  massively	
  parallel	
  processing	
  approaches	
  
•  Storage	
  and	
  analysis	
  of	
  both	
  structured	
  and	
  mul5-­‐structured	
  data	
  
•  Integra5on	
  of	
  external	
  (social)	
  and	
  corporate	
  data	
  for	
  more	
  complete	
  perspec5ve	
  
•  Schema-­‐free	
  and	
  schema-­‐on-­‐read	
  approaches	
  to	
  data	
  storage/analysis	
  
•  Adop5on	
  of	
  exploratory	
  analy5c	
  approaches	
  to	
  iden5fy	
  new	
  paSerns	
  in	
  data	
  
•  Predic5ve	
  analy5cs	
  as	
  a	
  fundamental	
  component	
  of	
  BI	
  strategies	
  
•  Machine-­‐learning	
  algorithms	
  automate	
  the	
  reflec5on	
  of	
  collec5ve	
  intelligence	
  
•  Increased	
  adop5on	
  of	
  in-­‐memory	
  databases	
  for	
  rapid	
  data	
  inges5on	
  
•  Real-­‐5me	
  analysis	
  of	
  data	
  prior	
  to	
  storage	
  within	
  the	
  data	
  warehouse/Hadoop	
  
•  Interac5ve,	
  na5ve,	
  SQL-­‐based	
  analysis	
  of	
  data	
  in	
  Hadoop	
  and	
  HBase	
  
•  Large-­‐scale	
  processing	
  of	
  sensor	
  and	
  other	
  machine-­‐generated	
  data/events	
  
	
  	
   Copyright (C) 2015 451 Research LLC
•  Apache Hadoop
•  Object storage
•  NoSQL
•  Steam processing
•  Predictive analytics
•  Data wrangling
Big data: cause and effect
•  Volume
•  Velocity
•  Variety
Economics:
•  Commodity hardware
•  Open source software
EFFECTEFFECTEDCAUSE
12
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
IoT	
  
Copyright (C) 2015 451 Research LLC
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional Analytic Systems Under Pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture Emerges to Unify Analytics & Data Processing
Modern Data Analytics Architecture
•  Enable applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized
approach analytics, governance,
security and operations
•  Versatile to handle any applications
and datasets no matter the size or
type
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMP
P	
  
EDW	
  
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the Data Lake for AnalyticsSCALE
SCOPE
Data Lake Definition
•  Centralized Architecture
Multiple applications on a shared data set
with consistent levels of service
•  Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
•  Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps
Goal:
•  Centralized Architecture
•  Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
16
Your Data at Webscale Economics
16
HyperStore:	
  	
  SoZware	
  Defined	
  Storage	
  
REPLICATION	
  
	
  (RF=1,2,3,4)	
  
ERASURE	
  CODING	
  
(N+1,2,3,4)	
  
COMPRESSION	
  
(Zlib,lz4)	
  
Commodity	
  Servers	
   Scale	
  Out	
   Durable	
   Simple	
  to	
  Use	
  
CPU	
   Disks	
   Network	
  
	
  	
  	
  
Heterogeneous	
  Node	
  
100TB	
  
300TB	
  
17
Smart Data	
17
Consumer Activity
(Events, GPS, WiFi)
Social MediaDevice Tracking and Logs
Cloudian HyperStore
INTERNET	
  OF	
  THINGS	
  
BIG	
  DATA	
  
Event	
  processing	
  
plaMorm	
  
ü Analyze more – allows for efficient bulk
data analysis in place
ü Faster time-to-decision
ü HyperStore scales out with your data –
adding nodes for I/O
Analytics
Result of Analysis
18
Integration of Cloudian and Hortonworks
18
19
Interoperability : Cloudian & Hortonworks
19
YARN : Data Operating System
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
Batch
Map
Reduce
Linux Windows On-Premise Cloud
HDFS
S3 Native File System (URI scheme: s3n)
20
Use Cases
20
Hadoop for Internet of Things
Clickstream data Sentiment data Server log data Sensor data
Analysis of what people click on –
Individual web pages and in what
order.
Clickstream analysis can reveal
how users research products and
also how they complete their
online purchases.
ü  Internet Marketing
ü  Online Commerce
Unstructured data on opinions,
emotions, and attitudes from
sources like social media posts,
blogs, online product reviews and
customer support interactions.
Organizations use sentiment
analysis to understand how the
public feels about something and
track how those opinions change
over time.
ü  Retail
ü  Media & Entertainment
Large enterprises build, manage
and protect their own proprietary,
distributed information networks.
Server logs are the computer-
generated records that report
data on the operations of those
networks.
When there is a problem, its one
of the first places the IT team
looks for a diagnosis.
ü  IT Organizations
ü  Customer Support
From refrigerators and coffee
makers to energy-measuring
smart meters, sensor data is
everywhere. It is created by the
machinery that runs assembly
lines and the cell towers that
route our phone calls.
It is net new data that is
increasing exponential in the
information age.
ü  Manufacturing
ü  Industrial
21
Cloudian Smart Support
21
Thank You!
Matt Aslett
matthew.aslett@451research.com
www.451research.com
@maslett
Paul Turner
pturner@cloudian.com
www.cloudian.com
@CloudianStorage
John Kreisa
john@hortonworks.com
www.hortonworks.com
@Hortonworks

More Related Content

PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
PDF
Enterprise Apache Hadoop: State of the Union
PDF
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
PDF
Hortonworks and Platfora in Financial Services - Webinar
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
PDF
Discover.hdp2.2.storm and kafka.final
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Enterprise Apache Hadoop: State of the Union
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks and Platfora in Financial Services - Webinar
Webinar turbo charging_data_science_hawq_on_hdp_final
Discover.hdp2.2.storm and kafka.final
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...

What's hot (20)

PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
PDF
Apache Hadoop on the Open Cloud
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
PDF
Hortonworks sqrrl webinar v5.pptx
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PPTX
YARN Ready: Integrating to YARN with Tez
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
PDF
The Next Generation of Big Data Analytics
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
PPTX
Bigger Data For Your Budget
PDF
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
PPTX
Transform You Business with Big Data and Hortonworks
PDF
Hortonworks, Novetta and Noble Energy Webinar
PDF
Hortonworks and HP Vertica Webinar
Hadoop 2.0: YARN to Further Optimize Data Processing
Apache Hadoop on the Open Cloud
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks sqrrl webinar v5.pptx
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
YARN Ready: Integrating to YARN with Tez
Data Lake for the Cloud: Extending your Hadoop Implementation
Eliminating the Challenges of Big Data Management Inside Hadoop
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
The Next Generation of Big Data Analytics
Enterprise Hadoop with Hortonworks and Nimble Storage
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Bigger Data For Your Budget
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Transform You Business with Big Data and Hortonworks
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks and HP Vertica Webinar
Ad

Viewers also liked (20)

PDF
RS Randall Resume 0117
PPTX
BIG Data & Hadoop Applications in Logistics
PDF
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
PDF
Splunk-hortonworks-risk-management-oct-2014
PDF
Hortonworks and Voltage Security webinar
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
PDF
Abivin - Big Data Analytics & Optimization
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
PDF
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
PPTX
SC4 Workshop 1: Logistics and big data German herrero
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
PDF
Smarter Digital Banking
PPTX
Boost Performance with Scala – Learn From Those Who’ve Done It!
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
PDF
Dataguise hortonworks insurance_feb25
PDF
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
RS Randall Resume 0117
BIG Data & Hadoop Applications in Logistics
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
Splunk-hortonworks-risk-management-oct-2014
Hortonworks and Voltage Security webinar
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Abivin - Big Data Analytics & Optimization
Hp Converged Systems and Hortonworks - Webinar Slides
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
SC4 Workshop 1: Logistics and big data German herrero
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Supporting Financial Services with a More Flexible Approach to Big Data
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Smarter Digital Banking
Boost Performance with Scala – Learn From Those Who’ve Done It!
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Dataguise hortonworks insurance_feb25
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Ad

Similar to Cloudian 451-hortonworks - webinar (20)

PPTX
Big Data Management: What's New, What's Different, and What You Need To Know
PPTX
Intro to Big Data Analytics and the Hybrid Cloud
PPTX
Big data4businessusers
PDF
Why Big Data - the data rush
PPTX
Finding business value in Big Data
PPTX
From Data to Data Driven - Applications that will change your business
PPTX
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
PPTX
Fundamentals of Big Data
PDF
Solving Big Data Problems using Hortonworks
PDF
IoT Crash Course Hadoop Summit SJ
PPTX
Big Data IDEA 101 2019
PDF
Exploring the Wider World of Big Data
PPTX
Big Data - Applications and Technologies Overview
PDF
Exploring the Wider World of Big Data- Vasalis Kapsalis
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
PDF
Mighty Guides- Data Disruption
PDF
Big data and analytics
PDF
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
PDF
The Age of Big Data: A New Class of Economic Asset
PPTX
Building a Big Data Solution
Big Data Management: What's New, What's Different, and What You Need To Know
Intro to Big Data Analytics and the Hybrid Cloud
Big data4businessusers
Why Big Data - the data rush
Finding business value in Big Data
From Data to Data Driven - Applications that will change your business
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Fundamentals of Big Data
Solving Big Data Problems using Hortonworks
IoT Crash Course Hadoop Summit SJ
Big Data IDEA 101 2019
Exploring the Wider World of Big Data
Big Data - Applications and Technologies Overview
Exploring the Wider World of Big Data- Vasalis Kapsalis
Eliminating the Challenges of Big Data Management Inside Hadoop
Mighty Guides- Data Disruption
Big data and analytics
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
The Age of Big Data: A New Class of Economic Asset
Building a Big Data Solution

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
ERP Manufacturing Modules & Consulting Solutions : Contetra Pvt Ltd
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PDF
Lumion Pro Crack New latest version Download 2025
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PPTX
Lesson-3-Operation-System-Support.pptx-I
PPTX
Human-Computer Interaction for Lecture 2
PDF
MAGIX Sound Forge Pro CrackSerial Key Keygen
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PDF
Internet Download Manager IDM Crack powerful download accelerator New Version...
PDF
CapCut PRO for PC Crack New Download (Fully Activated 2025)
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PDF
Crypto Loss And Recovery Guide By Expert Recovery Agency.
PPTX
Human-Computer Interaction for Lecture 1
PDF
Website Design & Development_ Professional Web Design Services.pdf
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
ERP Manufacturing Modules & Consulting Solutions : Contetra Pvt Ltd
Top 10 Project Management Software for Small Teams in 2025.pdf
Lumion Pro Crack New latest version Download 2025
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Lesson-3-Operation-System-Support.pptx-I
Human-Computer Interaction for Lecture 2
MAGIX Sound Forge Pro CrackSerial Key Keygen
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
Internet Download Manager IDM Crack powerful download accelerator New Version...
CapCut PRO for PC Crack New Download (Fully Activated 2025)
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
Human Computer Interaction lecture Chapter 2.pptx
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Crypto Loss And Recovery Guide By Expert Recovery Agency.
Human-Computer Interaction for Lecture 1
Website Design & Development_ Professional Web Design Services.pdf

Cloudian 451-hortonworks - webinar

  • 1. Big Data Storage and Analytics Q&A Matthew Aslett, research director
  • 2. 2 Webinar Logistics ●  Be on the look-out for polling questions ●  You may ask questions at any time during the presentation by using the Q&A box ●  ON-Demand Viewers please tweet us questions @cloudianstorage ●  At the end of the presentation please provide feedback and rate us
  • 3. 451 Research is an information technology research & advisory company Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 12,500+ senior IT professionals in our research community Over 52 million data points each quarter 4,500+ reports published each year covering 2,000+ innovative technology & service providers Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C. 451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group Research & Data Advisory Services Events 3 Copyright (C) 2015 451 Research LLC
  • 4. 4 Our Speakers 4 Paul Turner leads marketing, product planning and strategy at Cloudian. A storage industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office, guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23 years of development and management leadership, including 15 years at Oracle. Matt Aslet, Research Director for the data platforms and analytics research channel, has overall responsibility for the coverage of operational and analytic databases, data integration, data quality, and business intelligence. Matt's own primary area of focus is on relational and non-relational databases - including NoSQL and NewSQL - data warehousing, data caching, and Hadoop. Matthew is also an expert in open source software and regularly contributes to 451 Research's open source-related research. John Kreisa A veteran from the enterprise marketing industry, John has worked on products at every level of the IT stack from the depths of storage through to the insight of business intelligence and analytics. Currently John leads partner and strategic marketing initiatives at open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
  • 5. •  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect 5 Copyright (C) 2015 451 Research LLC CAUSE?
  • 6. •  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety EFFECT 6 Copyright (C) 2015 451 Research LLC CAUSE?
  • 7. •  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety EFFECTEFFECTEDCAUSE 7 Copyright (C) 2015 451 Research LLC
  • 8. •  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety Economics: •  Commodity hardware •  Open source software EFFECTEFFECTEDCAUSE 8 Copyright (C) 2015 451 Research LLC
  • 9. Big data is driven by economics 9 “Big  data  is  what  happened  when  the  cost  of  keeping  informa5on   became  less  than  the  cost  of  throwing  it  away.”    –  George  Dyson   “Big  data:  New  business  insights  based  on  storing,  processing  and   analyzing  data  that  was  previously  ignored  due  to  the  cost  and   func5onal  limita5ons  of  tradi5onal  data  management  technologies.”    –  451  Research       Copyright (C) 2015 451 Research LLC
  • 10. Big data is driven by economics 10 Copyright (C) 2015 451 Research LLC What  happened  when  the  cost  of  keeping  informa5on  became  less   than  the  cost  of  throwing  it  away?  
  • 11. Big data is driven by economics 11 What  happened  when  the  cost  of  keeping  informa5on  became  less   than  the  cost  of  throwing  it  away?   •  The  processing  and  analysis  of  very  large  data  sets  in  their  en5rety   •  Increased  adop5on  of  massively  parallel  processing  approaches   •  Storage  and  analysis  of  both  structured  and  mul5-­‐structured  data   •  Integra5on  of  external  (social)  and  corporate  data  for  more  complete  perspec5ve   •  Schema-­‐free  and  schema-­‐on-­‐read  approaches  to  data  storage/analysis   •  Adop5on  of  exploratory  analy5c  approaches  to  iden5fy  new  paSerns  in  data   •  Predic5ve  analy5cs  as  a  fundamental  component  of  BI  strategies   •  Machine-­‐learning  algorithms  automate  the  reflec5on  of  collec5ve  intelligence   •  Increased  adop5on  of  in-­‐memory  databases  for  rapid  data  inges5on   •  Real-­‐5me  analysis  of  data  prior  to  storage  within  the  data  warehouse/Hadoop   •  Interac5ve,  na5ve,  SQL-­‐based  analysis  of  data  in  Hadoop  and  HBase   •  Large-­‐scale  processing  of  sensor  and  other  machine-­‐generated  data/events       Copyright (C) 2015 451 Research LLC
  • 12. •  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling Big data: cause and effect •  Volume •  Velocity •  Variety Economics: •  Commodity hardware •  Open source software EFFECTEFFECTEDCAUSE 12                     IoT   Copyright (C) 2015 451 Research LLC
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Traditional Analytic Systems Under Pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Modern Data Architecture Emerges to Unify Analytics & Data Processing Modern Data Analytics Architecture •  Enable applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach analytics, governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch BatchMP P   EDW  
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Enabling the Data Lake for AnalyticsSCALE SCOPE Data Lake Definition •  Centralized Architecture Multiple applications on a shared data set with consistent levels of service •  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. •  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps Goal: •  Centralized Architecture •  Data-driven Business DATA LAKE Journey to the Data Lake with Hadoop Systems of Insight
  • 16. 16 Your Data at Webscale Economics 16 HyperStore:    SoZware  Defined  Storage   REPLICATION    (RF=1,2,3,4)   ERASURE  CODING   (N+1,2,3,4)   COMPRESSION   (Zlib,lz4)   Commodity  Servers   Scale  Out   Durable   Simple  to  Use   CPU   Disks   Network         Heterogeneous  Node   100TB   300TB  
  • 17. 17 Smart Data 17 Consumer Activity (Events, GPS, WiFi) Social MediaDevice Tracking and Logs Cloudian HyperStore INTERNET  OF  THINGS   BIG  DATA   Event  processing   plaMorm   ü Analyze more – allows for efficient bulk data analysis in place ü Faster time-to-decision ü HyperStore scales out with your data – adding nodes for I/O Analytics Result of Analysis
  • 18. 18 Integration of Cloudian and Hortonworks 18
  • 19. 19 Interoperability : Cloudian & Hortonworks 19 YARN : Data Operating System Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Batch Map Reduce Linux Windows On-Premise Cloud HDFS S3 Native File System (URI scheme: s3n)
  • 20. 20 Use Cases 20 Hadoop for Internet of Things Clickstream data Sentiment data Server log data Sensor data Analysis of what people click on – Individual web pages and in what order. Clickstream analysis can reveal how users research products and also how they complete their online purchases. ü  Internet Marketing ü  Online Commerce Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support interactions. Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time. ü  Retail ü  Media & Entertainment Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computer- generated records that report data on the operations of those networks. When there is a problem, its one of the first places the IT team looks for a diagnosis. ü  IT Organizations ü  Customer Support From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age. ü  Manufacturing ü  Industrial
  • 22. Thank You! Matt Aslett [email protected] www.451research.com @maslett Paul Turner [email protected] www.cloudian.com @CloudianStorage John Kreisa [email protected] www.hortonworks.com @Hortonworks