SlideShare a Scribd company logo
We’ll get started soon… 
Q&A box is available for your questions 
Webinar will be recorded for future viewing 
Thank you for joining! 
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deliver the Data Lake (demo/deep dive) 
…using HDP and Red Hat JBoss Data Virtualization 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
We do Hadoop.
Your speakers… 
Raghu Thiagarajan, Dir, Partner Product Management, Hortonworks 
Kimberly Palko, Principal Product Manager, Red Hat 
Kenny Peeples, Principal Technical Marketing Manager, Red Hat 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
An architectural shift towards an HDP Data Lake 
Unlocking the Data Lake 
SCALE SCOPE 
RDBMS 
MPP 
EDW 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Data Lake 
Enabled by YARN 
• Single data repository, 
shared infrastructure 
• Multiple biz apps 
accessing all the data 
• Enable a shift from 
reactive to proactive 
interactions 
• Gain new insight across 
the entire enterprise 
New Analytic Apps 
or IT Optimization 
HDP 2.1 
Governance 
& Integration 
Security 
Operations 
Data Access 
YARN 
Data Management
What is a Data Lake? 
Architectural Pattern in the Data Center 
Uses Hadoop to deliver deeper insight across a large, broad, diverse set 
of data efficiently 
§ Multipurpose, Open PLATFORM for Data (NOT a database) 
§ Land all data in a single place and interact with it in many ways 
§ Allows for the ecosystem to provide higher level services (SAS, SAP, Microsoft for Streaming, 
MPP, In-memory, etc..) 
§ First class data management capabilities (metadata management, security, transformation 
pipelines, replication, retention, etc..) 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP Data Lake Solution Architecture 
Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced 
Security 
Step 4: Schedule and Orchestrate 
Step 3: Transform, Aggregate & Materialize 
STORM 
JMS 
Step 1:Extract & Load 
NFS 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HIVE PIG Cascading 
(table & user-defined metadata) 
Step 2: Model/Apply Metadata 
compute 
& 
storage 
HCATALOG 
. . . 
SolR 
Storm 
. . . 
. . 
compute 
& 
storage 
. 
. 
YARN 
AMBARI 
Data Lake HDP Grid 
Use Case Type 1: 
Materialize & 
Exchange 
Interactive 
Hive Server 
(Tez/Stinger) 
Stream Processing, 
Real-time Search, 
MPI, etc. 
YARN Apps 
Opens up Many 
New Use Cases 
Query/ 
Analytics/Reporting 
Tools 
Tableau, Excel, 
Microstrategy 
Datameer, Platfora, 
Business Objects 
Use Case Type 2: 
Explore/Visualize 
FALCON (Data pipeline & flow management) 
SOURCE DATA 
Click Stream 
Sales 
Transactions 
Product Data 
Marketing/ 
Inventory 
Social Data 
EDW 
NFS 
Apache Argus (Unified Access Controls and Audit) 
(data processing) 
Exchange 
HBase 
Client 
Sqoop/Hive 
Downstream 
Data Sources 
OLTP 
HBase 
EDW 
(Teradata) 
MR2 Graph 
SAS 
Ingestion 
SQOOP 
FLUME 
Web HDFS 
REST 
HTTP 
Streamin 
g 
TEZ 
Mahout
HDP Data Lake Solution Architecture + Virtual Data Mart 
Manage Steps 1-4: Data Management with Falcon, Security with HDP 
Advanced Security 
Step 4: Schedule and Orchestrate 
HIVE PIG Cascadin 
g 
Step 3: Transform, Aggregate & Materialize 
(table & user-defined metadata) 
Step 2: Model/Apply Metadata 
compute 
& 
storage 
STORM 
JMS 
Step 1:Extract & Load 
NFS 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HCATALOG 
. . . 
SolR 
Storm 
. . . 
. . 
compute 
& 
storage 
. 
. 
YARN 
AMBARI 
Data Lake HDP Grid 
Use Case Type 1: 
Materialize & 
Exchange 
Interactive 
Hive Server 
(Tez/Stinger) 
Stream 
Processing, 
Real-time Search, 
MPI, etc. 
YARN Apps 
Opens up Many 
New Use Cases 
Query/ 
Analytics/ 
Reporting Tools 
Tableau, Excel, 
Microstrategy 
Datameer, 
Platfora, Business 
Objects 
Use Case Type 2: 
Explore/Visualize 
FALCON (Data pipeline & flow management) 
SOURCE DATA 
Click Stream 
Sales 
Transactions 
Product Data 
Marketing/ 
Inventory 
Social Data 
EDW 
NFS 
Apache Argus (Unified Access Controls and Audit) 
(data processing) 
Exchange 
HBase 
Client 
Sqoop/Hive 
Downstream 
Data Sources 
OLTP 
HBase 
EDW 
(Teradata) 
MR2 Graph 
SAS 
Ingestion 
SQOOP 
FLUME 
Web HDFS 
REST 
HTTP 
Streami 
ng 
TEZ 
Mahout 
Dept Base Virtual Database (VDB) 
Team 1 
VDB 
Team2 
VDB 
View1 View2
Yarn allows for new processing engines 
Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced 
STORM 
JMS 
Step 1:Extract & Load 
NFS 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Security 
Step 4: Schedule and Orchestrate 
HIVE PIG Cascading 
Step 3: Transform, Aggregate & Materialize 
(table & user-defined metadata) 
Step 2: Model/Apply Metadata 
compute 
& 
storage 
HCATALOG 
. . . 
SolR 
Storm 
. . . 
. . 
compute 
& 
storage 
. 
. 
YARN 
AMBARI 
Data Lake HDP Grid 
Use Case Type 1: 
Materialize & 
Exchange 
Interactive 
Hive Server 
(Tez/Stinger) 
Stream Processing, 
Real-time Search, 
MPI, etc. 
YARN Apps 
Opens up Many New 
Use Cases 
Query/ 
Analytics/Reporting 
Tools 
Tableau, Excel, 
Microstrategy 
Datameer, Platfora, 
Business Objects 
Use Case Type 2: 
Explore/Visualize 
FALCON (Data pipeline & flow management) 
SOURCE DATA 
Click Stream 
Sales 
Transactions 
Product Data 
Marketing/ 
Inventory 
Social Data 
EDW 
NFS 
Apache Argus (Unified Access Controls and Audit) 
(data processing) 
Exchange 
HBase 
Client 
Sqoop/Hive 
Downstream 
Data Sources 
OLTP 
HBase 
EDW 
(Teradata) 
MR2 Graph 
SAS 
Ingestion 
SQOOP 
FLUME 
Web HDFS 
REST 
HTTP 
Streamin 
g 
TEZ 
Mahout
Falcon enables Governance of Data Pipelines 
Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced 
STORM 
JMS 
Step 1:Extract & Load 
NFS 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Security 
Step 4: Schedule and Orchestrate 
HIVE PIG Cascading 
Step 3: Transform, Aggregate & Materialize 
(table & user-defined metadata) 
Step 2: Model/Apply Metadata 
compute 
& 
storage 
HCATALOG 
. . . 
SolR 
Storm 
. . . 
. . 
compute 
& 
storage 
. 
. 
YARN 
AMBARI 
Data Lake HDP Grid 
Use Case Type 1: 
Materialize & 
Exchange 
Interactive 
Hive Server 
(Tez/Stinger) 
Stream Processing, 
Real-time Search, 
MPI, etc. 
YARN Apps 
Opens up Many New 
Use Cases 
Query/ 
Analytics/Reporting 
Tools 
Tableau, Excel, 
Microstrategy 
Datameer, Platfora, 
Business Objects 
Use Case Type 2: 
Explore/Visualize 
FALCON (Data pipeline & flow management) 
SOURCE DATA 
Click Stream 
Sales 
Transactions 
Product Data 
Marketing/ 
Inventory 
Social Data 
EDW 
NFS 
Apache Argus (Unified Access Controls and Audit) 
(data processing) 
Exchange 
HBase 
Client 
Sqoop/Hive 
Downstream 
Data Sources 
OLTP 
HBase 
EDW 
(Teradata) 
MR2 Graph 
SAS 
Ingestion 
SQOOP 
FLUME 
Web HDFS 
REST 
HTTP 
Streamin 
g 
TEZ 
Mahout
Apache Falcon: Data Governance in the Lake 
Falcon Adds the required data governance features 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Data pipeline 
Raw Clean Prep 
Defined in 
Adds the required data governance 
Auto generate & 
orchestrate 
Multiple complex Oozie workflows 
Job1 
Job2 JobN 
Job3 
Job4 Job7 Job6 JobN 
Job1 
Job2 JobN 
Job3 
Job4 Job7 Job6 JobN 
Other Hadoop 
ecosystem tools 
Eg. DistCp 
features 
DEFINITION 
Replication | Retention 
Eviction | Late data 
MONITORING 
TRACING 
Audit | Lineage 
Tagging
Mashing up diverse data types in the Data Lake 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Mashing up diverse data types in the Data Lake 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Mashing up diverse data types in the Data Lake 
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Mashing up diverse data types in the Data Lake 
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Mashing up diverse data types in the Data Lake 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Mashing up diverse data types in the Data Lake 
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Virtual Data Marts with Red Hat JBoss 
Data Virtualization and Hortonworks HDP 
Kimberly Palko 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Supply and Integration Solution 
Data Virtualization sits in front of multiple data 
sources and 
ß allows them to be treated a single source 
ß delivering the desired data 
ß in the required form 
ß at the right time 
ß to any application and/or user. 
THINK VIRTUAL MACHINE FOR DATA 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Easy Access to Big Data 
Hive 
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
• Reporting tool accesses the 
data virtualization server via rich 
SQL dialect 
• The data virtualization server 
translates rich SQL dialect to 
HiveQL 
• Hive translates HiveQL to 
MapReduce 
• MapReduce runs MR job on big 
data 
MapReduce 
HDFS 
Analytical 
Reporting 
Tool 
Data 
Virtualization 
Server 
Hadoop 
Big Data
Use Case 1: Combine data from 
Hadoop with traditional data 
sources 
Problem: 
Data from new data sources like social media, 
clickstream and sensors needs to be combined 
with data from traditional sources to get the full 
value. 
Solution: 
Leverage JBoss Data Virtualization to mashup 
new data in Hadoop with data in traditional data 
sources without moving or copying any data and 
access it through a variety of BI tools and SOA 
technologies. 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Data 
can 
be 
accessed 
by 
mul/ple 
tools 
and 
methods 
already 
in-­‐house 
Consume 
Compose 
Connect 
JBoss Data 
Virtualization 
Hive 
SOURCE 
1: 
Hive/Hadoop 
contains 
data 
from 
new 
data 
sources 
like 
social 
media, 
clickstream 
and 
sensor 
data 
SOURCE 
2: 
Tradi/onal 
rela/onal 
databases 
in 
the 
enterprise
Use Case 2: Federating across 
Geographically Distributed 
Hadoop Clusters 
Problem: 
Geographically distributed Hadoop clusters contains 
sensitive data like patient records or customer 
identification that cannot be accessed by other 
regions due to regulatory policy. IT needs access to 
all data, but users can only access the data in their 
region. 
Solution: 
Leverage JBoss Data Virtualization to provide Row 
Level Security and Masking of columns while 
federating across Hadoop clusters. 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Data 
can 
be 
accessed 
by 
mul/ple 
tools 
and 
methods 
already 
in-­‐house 
Consume 
Compose 
Connect 
JBoss Data 
Virtualization 
Hive 
Hadoop 
cluster 
in 
one 
geographic 
region 
Hive 
Hadoop 
cluster 
in 
a 
second 
geographic 
region
Data for entire organization in Hadoop Data Lake 
Problem: How does IT control access and give business users just the 
data they need? 
- Does every line of business have access to everyone’s data? 
- How do business users get access to the data they need in a 
simple (even self-service) way? 
Hadoop Data Lake 
HR Employee 
Files Server 
Marketing 
Clickstream 
Data Finance 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Expense 
Reports 
Logs 
Sales 
Transactions 
Customer 
Twitter Sentiment Accounts 
Data
Secure, Self-Service Virtual Data Marts for Hadoop 
Solution: Use JBoss Data Virtualization to create virtual data marts 
on top of a Hadoop cluster 
- Lines of Business get access to the data they need in a simple manner 
- IT maintains the process and control it needs 
- All data remains in the data lake, nothing is copied or moved 
Marketing Finance IT 
Marketing 
Clickstream Data 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hadoop Data Lake 
HR Employee Files Sales Transactions 
Finance 
Customer 
Expense 
Reports 
Twitter Sentiment Accounts 
Data 
Sales 
Server Logs
Optional hierarchical data architectures with virtual data mart 
Can be combined with security features like user role access and row and 
column masking 
Team2 
VDB 
Dept Base Virtual Database (VDB) 
Team 1 
VDB 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
View1 View2
Want most recent data in an operational data store 
Problem: All the legacy and archived data is in the Hadoop data lake. 
We want to access the most recent, up to the minute, operational data 
often and quickly. 
Marketing 
Clickstream Data 
Hadoop Data Lake 
Historical Data 
Finance 
Expense 
Reports 
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HR Employee Files Server 
Logs 
Sales Transactions 
Customer 
Accounts 
Twitter Sentiment Data
Caching For Faster Performance – Materialized View 
Query 1 Query 2 
Virtual Database (VDB) 
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Cached or Materialized 
View 1 
View 1 
• Same cached view for multiple 
queries 
• Refreshed automatically or manually 
• Cache repository can be any 
supported data source
Want most recent data in an operational data store 
Solution: Use JBoss Data Virtualization to integrate up to the minute data from 
multiple diverse data sources that can be quickly queried. 
- Use HDP for all data older than today. 
- Use JDV to materialize the data in HDP for faster access and to combine with operational VDB 
Materialized 
View 
Operational VDB Historical Data 
with up to the 
minute data 
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Marketing 
Clickstream Data 
Hadoop Data Lake 
HR Employee 
Files 
Finance 
Expense 
Reports 
Server 
Logs 
Sales 
Transactions 
Customer 
Accounts 
Twitter Sentiment 
Data 
Nightly 
Transfer from 
Data Sources
Demonstration 
Virtual Data Marts 
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
with 
Hadoop Data Lake 
Kenny Peeples
Use Case 3 - Overview 
Objexcxtivxe : 
–Purpose oriented data views for 
functional teams over a rich variety of 
semi-structured and structured data 
Problem: 
–Data Lakes have large volumes of 
consolidated clickstream data, product 
and customer data that need to be 
constrained for multi-departmental use. 
Solution: 
–Leverage HDP to mashup Clickstream 
analysis data with product and customer 
data on HDP to answer 
- Leverage Jboss Data Virt to provide 
Virtual data marts for each of Marketing 
and Product teams to ….. 
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights RHesOerRveTdO NWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Use Case 3 - Architecture 
APPLICATIONS 
Business 
Analy/cs 
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Custom 
Applica/ons 
Packaged 
Applica/ons 
DATA 
SYSTEM 
SOURCES 
Emerging 
Sources 
(Sensor, 
Sen/ment, 
Geo, 
Unstructured) 
Exis/ng 
Sources 
(CRM, 
ERP, 
Clickstream, 
Logs) 
HDP 2.1 
Governance 
& Integration 
Security 
Operations 
Data Access 
VIRTUAL 
DATA 
MART 
Data Management
Use Case 3 - Resources 
• GUIDE 
How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase3 
Tutorial: Available soon 
• VIDEOS: 
http://vimeo.com/user16928011/hwxuc3configuration 
http://vimeo.com/user16928011/hwxuc3run 
http://vimeo.com/user16928011/hwxuc3overview 
• SOURCE: 
https://github.com/DataVirtualizationByExample/HortonworksUseCase3 
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Benefits of JBoss Data Virtualization with 
Hortonworks HDP 2.1 
• Creates virtual databases for controlling 
access to data in a data lake while giving 
lines of business the autonomy they seek 
• Combines new data in Hadoop with data in 
traditional data sources without moving or 
copying data 
• Gives access to a variety of BI and analytics 
tools 
• Provides caching for faster access to data 
• Provides consistent security policy across 
multiple data sources 
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank you! 
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hortonworks and Red Hat JBoss Data Virtualization
Next Steps... 
More about Red Hat & Hortonworks 
http://hortonworks.com/partner/redhat 
Download the Hortonworks Sandbox 
Learn Hadoop 
Build Your Analytic App 
Try Hadoop 2 
Contact us: events@hortonworks.com 
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Don’t Forget to Register for our Next Webinar! 
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
September 17th, 10 AM PST 
Red Hat JBoss Data Virtualization and Hortonworks Data Platform 
http://info.hortonworks.com/RedHatSeries_Hortonworks.html

More Related Content

What's hot (20)

PPTX
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
PDF
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
PPTX
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
PDF
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
PPTX
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
PDF
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
PPTX
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
PPTX
Bigger Data For Your Budget
Hortonworks
 
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Bigger Data For Your Budget
Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 

Similar to Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3 (20)

PDF
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
PDF
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
PDF
Introduction to Hadoop
POSSCON
 
PDF
Discover hdp 2.2 hdfs - final
Hortonworks
 
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PPTX
OOP 2014
Emil Andreas Siemes
 
PDF
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
PDF
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Hortonworks
 
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
PPTX
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
PDF
Meetup oslo hortonworks HDP
Alexander Bakos LeirvĂĽg
 
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
PDF
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
Introduction to Hadoop
POSSCON
 
Discover hdp 2.2 hdfs - final
Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Meetup oslo hortonworks HDP
Alexander Bakos LeirvĂĽg
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Ad

Recently uploaded (20)

PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
big data eco system fundamentals of data science
arivukarasi
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 

Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3

  • 1. We’ll get started soon… Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! Page 1 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 2. Deliver the Data Lake (demo/deep dive) …using HDP and Red Hat JBoss Data Virtualization Page 2 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved We do Hadoop.
  • 3. Your speakers… Raghu Thiagarajan, Dir, Partner Product Management, Hortonworks Kimberly Palko, Principal Product Manager, Red Hat Kenny Peeples, Principal Technical Marketing Manager, Red Hat Page 3 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 4. An architectural shift towards an HDP Data Lake Unlocking the Data Lake SCALE SCOPE RDBMS MPP EDW Page 4 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Lake Enabled by YARN • Single data repository, shared infrastructure • Multiple biz apps accessing all the data • Enable a shift from reactive to proactive interactions • Gain new insight across the entire enterprise New Analytic Apps or IT Optimization HDP 2.1 Governance & Integration Security Operations Data Access YARN Data Management
  • 5. What is a Data Lake? Architectural Pattern in the Data Center Uses Hadoop to deliver deeper insight across a large, broad, diverse set of data efficiently § Multipurpose, Open PLATFORM for Data (NOT a database) § Land all data in a single place and interact with it in many ways § Allows for the ecosystem to provide higher level services (SAS, SAP, Microsoft for Streaming, MPP, In-memory, etc..) § First class data management capabilities (metadata management, security, transformation pipelines, replication, retention, etc..) Page 5 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 6. HDP Data Lake Solution Architecture Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced Security Step 4: Schedule and Orchestrate Step 3: Transform, Aggregate & Materialize STORM JMS Step 1:Extract & Load NFS Page 6 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved HIVE PIG Cascading (table & user-defined metadata) Step 2: Model/Apply Metadata compute & storage HCATALOG . . . SolR Storm . . . . . compute & storage . . YARN AMBARI Data Lake HDP Grid Use Case Type 1: Materialize & Exchange Interactive Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI, etc. YARN Apps Opens up Many New Use Cases Query/ Analytics/Reporting Tools Tableau, Excel, Microstrategy Datameer, Platfora, Business Objects Use Case Type 2: Explore/Visualize FALCON (Data pipeline & flow management) SOURCE DATA Click Stream Sales Transactions Product Data Marketing/ Inventory Social Data EDW NFS Apache Argus (Unified Access Controls and Audit) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) MR2 Graph SAS Ingestion SQOOP FLUME Web HDFS REST HTTP Streamin g TEZ Mahout
  • 7. HDP Data Lake Solution Architecture + Virtual Data Mart Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced Security Step 4: Schedule and Orchestrate HIVE PIG Cascadin g Step 3: Transform, Aggregate & Materialize (table & user-defined metadata) Step 2: Model/Apply Metadata compute & storage STORM JMS Step 1:Extract & Load NFS Page 7 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved HCATALOG . . . SolR Storm . . . . . compute & storage . . YARN AMBARI Data Lake HDP Grid Use Case Type 1: Materialize & Exchange Interactive Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI, etc. YARN Apps Opens up Many New Use Cases Query/ Analytics/ Reporting Tools Tableau, Excel, Microstrategy Datameer, Platfora, Business Objects Use Case Type 2: Explore/Visualize FALCON (Data pipeline & flow management) SOURCE DATA Click Stream Sales Transactions Product Data Marketing/ Inventory Social Data EDW NFS Apache Argus (Unified Access Controls and Audit) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) MR2 Graph SAS Ingestion SQOOP FLUME Web HDFS REST HTTP Streami ng TEZ Mahout Dept Base Virtual Database (VDB) Team 1 VDB Team2 VDB View1 View2
  • 8. Yarn allows for new processing engines Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced STORM JMS Step 1:Extract & Load NFS Page 8 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Security Step 4: Schedule and Orchestrate HIVE PIG Cascading Step 3: Transform, Aggregate & Materialize (table & user-defined metadata) Step 2: Model/Apply Metadata compute & storage HCATALOG . . . SolR Storm . . . . . compute & storage . . YARN AMBARI Data Lake HDP Grid Use Case Type 1: Materialize & Exchange Interactive Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI, etc. YARN Apps Opens up Many New Use Cases Query/ Analytics/Reporting Tools Tableau, Excel, Microstrategy Datameer, Platfora, Business Objects Use Case Type 2: Explore/Visualize FALCON (Data pipeline & flow management) SOURCE DATA Click Stream Sales Transactions Product Data Marketing/ Inventory Social Data EDW NFS Apache Argus (Unified Access Controls and Audit) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) MR2 Graph SAS Ingestion SQOOP FLUME Web HDFS REST HTTP Streamin g TEZ Mahout
  • 9. Falcon enables Governance of Data Pipelines Manage Steps 1-4: Data Management with Falcon, Security with HDP Advanced STORM JMS Step 1:Extract & Load NFS Page 9 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Security Step 4: Schedule and Orchestrate HIVE PIG Cascading Step 3: Transform, Aggregate & Materialize (table & user-defined metadata) Step 2: Model/Apply Metadata compute & storage HCATALOG . . . SolR Storm . . . . . compute & storage . . YARN AMBARI Data Lake HDP Grid Use Case Type 1: Materialize & Exchange Interactive Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI, etc. YARN Apps Opens up Many New Use Cases Query/ Analytics/Reporting Tools Tableau, Excel, Microstrategy Datameer, Platfora, Business Objects Use Case Type 2: Explore/Visualize FALCON (Data pipeline & flow management) SOURCE DATA Click Stream Sales Transactions Product Data Marketing/ Inventory Social Data EDW NFS Apache Argus (Unified Access Controls and Audit) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) MR2 Graph SAS Ingestion SQOOP FLUME Web HDFS REST HTTP Streamin g TEZ Mahout
  • 10. Apache Falcon: Data Governance in the Lake Falcon Adds the required data governance features Page 10 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Data pipeline Raw Clean Prep Defined in Adds the required data governance Auto generate & orchestrate Multiple complex Oozie workflows Job1 Job2 JobN Job3 Job4 Job7 Job6 JobN Job1 Job2 JobN Job3 Job4 Job7 Job6 JobN Other Hadoop ecosystem tools Eg. DistCp features DEFINITION Replication | Retention Eviction | Late data MONITORING TRACING Audit | Lineage Tagging
  • 11. Mashing up diverse data types in the Data Lake Page 11 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 12. Mashing up diverse data types in the Data Lake Page 12 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 13. Mashing up diverse data types in the Data Lake Page 13 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 14. Mashing up diverse data types in the Data Lake Page 14 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Mashing up diverse data types in the Data Lake Page 15 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 16. Mashing up diverse data types in the Data Lake Page 16 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 17. Virtual Data Marts with Red Hat JBoss Data Virtualization and Hortonworks HDP Kimberly Palko Page 17 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 18. Data Supply and Integration Solution Data Virtualization sits in front of multiple data sources and ß allows them to be treated a single source ß delivering the desired data ß in the required form ß at the right time ß to any application and/or user. THINK VIRTUAL MACHINE FOR DATA Page 18 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 19. Easy Access to Big Data Hive Page 19 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved • Reporting tool accesses the data virtualization server via rich SQL dialect • The data virtualization server translates rich SQL dialect to HiveQL • Hive translates HiveQL to MapReduce • MapReduce runs MR job on big data MapReduce HDFS Analytical Reporting Tool Data Virtualization Server Hadoop Big Data
  • 20. Use Case 1: Combine data from Hadoop with traditional data sources Problem: Data from new data sources like social media, clickstream and sensors needs to be combined with data from traditional sources to get the full value. Solution: Leverage JBoss Data Virtualization to mashup new data in Hadoop with data in traditional data sources without moving or copying any data and access it through a variety of BI tools and SOA technologies. Page 20 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Data can be accessed by mul/ple tools and methods already in-­‐house Consume Compose Connect JBoss Data Virtualization Hive SOURCE 1: Hive/Hadoop contains data from new data sources like social media, clickstream and sensor data SOURCE 2: Tradi/onal rela/onal databases in the enterprise
  • 21. Use Case 2: Federating across Geographically Distributed Hadoop Clusters Problem: Geographically distributed Hadoop clusters contains sensitive data like patient records or customer identification that cannot be accessed by other regions due to regulatory policy. IT needs access to all data, but users can only access the data in their region. Solution: Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns while federating across Hadoop clusters. Page 21 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Data can be accessed by mul/ple tools and methods already in-­‐house Consume Compose Connect JBoss Data Virtualization Hive Hadoop cluster in one geographic region Hive Hadoop cluster in a second geographic region
  • 22. Data for entire organization in Hadoop Data Lake Problem: How does IT control access and give business users just the data they need? - Does every line of business have access to everyone’s data? - How do business users get access to the data they need in a simple (even self-service) way? Hadoop Data Lake HR Employee Files Server Marketing Clickstream Data Finance Page 22 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Expense Reports Logs Sales Transactions Customer Twitter Sentiment Accounts Data
  • 23. Secure, Self-Service Virtual Data Marts for Hadoop Solution: Use JBoss Data Virtualization to create virtual data marts on top of a Hadoop cluster - Lines of Business get access to the data they need in a simple manner - IT maintains the process and control it needs - All data remains in the data lake, nothing is copied or moved Marketing Finance IT Marketing Clickstream Data Page 23 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Data Lake HR Employee Files Sales Transactions Finance Customer Expense Reports Twitter Sentiment Accounts Data Sales Server Logs
  • 24. Optional hierarchical data architectures with virtual data mart Can be combined with security features like user role access and row and column masking Team2 VDB Dept Base Virtual Database (VDB) Team 1 VDB Page 24 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved View1 View2
  • 25. Want most recent data in an operational data store Problem: All the legacy and archived data is in the Hadoop data lake. We want to access the most recent, up to the minute, operational data often and quickly. Marketing Clickstream Data Hadoop Data Lake Historical Data Finance Expense Reports Page 25 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved HR Employee Files Server Logs Sales Transactions Customer Accounts Twitter Sentiment Data
  • 26. Caching For Faster Performance – Materialized View Query 1 Query 2 Virtual Database (VDB) Page 26 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Cached or Materialized View 1 View 1 • Same cached view for multiple queries • Refreshed automatically or manually • Cache repository can be any supported data source
  • 27. Want most recent data in an operational data store Solution: Use JBoss Data Virtualization to integrate up to the minute data from multiple diverse data sources that can be quickly queried. - Use HDP for all data older than today. - Use JDV to materialize the data in HDP for faster access and to combine with operational VDB Materialized View Operational VDB Historical Data with up to the minute data Page 27 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Marketing Clickstream Data Hadoop Data Lake HR Employee Files Finance Expense Reports Server Logs Sales Transactions Customer Accounts Twitter Sentiment Data Nightly Transfer from Data Sources
  • 28. Demonstration Virtual Data Marts Page 28 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved with Hadoop Data Lake Kenny Peeples
  • 29. Use Case 3 - Overview Objexcxtivxe : –Purpose oriented data views for functional teams over a rich variety of semi-structured and structured data Problem: –Data Lakes have large volumes of consolidated clickstream data, product and customer data that need to be constrained for multi-departmental use. Solution: –Leverage HDP to mashup Clickstream analysis data with product and customer data on HDP to answer - Leverage Jboss Data Virt to provide Virtual data marts for each of Marketing and Product teams to ….. Page 29 Š Hortonworks Inc. 2011 – 2014. All Rights RHesOerRveTdO NWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
  • 30. Use Case 3 - Architecture APPLICATIONS Business Analy/cs Page 30 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Custom Applica/ons Packaged Applica/ons DATA SYSTEM SOURCES Emerging Sources (Sensor, Sen/ment, Geo, Unstructured) Exis/ng Sources (CRM, ERP, Clickstream, Logs) HDP 2.1 Governance & Integration Security Operations Data Access VIRTUAL DATA MART Data Management
  • 31. Use Case 3 - Resources • GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase3 Tutorial: Available soon • VIDEOS: http://vimeo.com/user16928011/hwxuc3configuration http://vimeo.com/user16928011/hwxuc3run http://vimeo.com/user16928011/hwxuc3overview • SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase3 Page 31 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 32. Benefits of JBoss Data Virtualization with Hortonworks HDP 2.1 • Creates virtual databases for controlling access to data in a data lake while giving lines of business the autonomy they seek • Combines new data in Hadoop with data in traditional data sources without moving or copying data • Gives access to a variety of BI and analytics tools • Provides caching for faster access to data • Provides consistent security policy across multiple data sources Page 32 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 33. Thank you! Page 33 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks and Red Hat JBoss Data Virtualization
  • 34. Next Steps... More about Red Hat & Hortonworks http://hortonworks.com/partner/redhat Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: [email protected] Page 34 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 35. Don’t Forget to Register for our Next Webinar! Page 35 Š Hortonworks Inc. 2011 – 2014. All Rights Reserved September 17th, 10 AM PST Red Hat JBoss Data Virtualization and Hortonworks Data Platform http://info.hortonworks.com/RedHatSeries_Hortonworks.html