SlideShare a Scribd company logo
Accelerating Success with Rapid Data
Integration for the Modern Data
Architecture
John Kreisa, Hortonworks
Lawrence Schwartz, Attunity
Speakers
Lawrence	
  Schwartz,	
  
A/unity	
  
John	
  Kreisa,	
  
Hortonworks	
  
Customer Momentum
•  230+ customers (as of Q3 2014)
Hortonworks Data Platform
•  Completely open multi-tenant platform for any app & any
data.
•  A centralized architecture of consistent enterprise
services for resource management, security, operations,
and governance.
Partner for Customer Success
•  Open source community leadership focus on enterprise
needs
•  Unrivaled world class support
•  Founded in 2011
•  Original 24 architects,
developers,
operators of Hadoop from
Yahoo!
•  600+ Employees
•  1000+ Ecosystem Partners
Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP
Traditional systems under pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing
large volumes of high velocity and variety of data
•  Built by Yahoo! to be the heartbeat of its ad & search business
•  Donated to Apache Software Foundation in 2005 with rapid
adoption by large web properties & early adopter enterprises
Hadoop Advantages
ü  Manages new data paradigm
ü  Handles data at scale
ü  Cost effective
ü  Open source
Application
Storage
HDFS
Batch Processing
MapReduce
The Modern Data Architecture
Provision,
Manage &
Monitor
APPLICATIONS	
  DATA	
  	
  SYSTEM	
  
OPERATIONAL	
  TOOLS	
  
DEV	
  &	
  DATA	
  TOOLS	
  
INFRASTRUCTURE	
  
Build & Test
On Premise or in
the Cloud
SOURCES	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
GeolocaCon	
  
Data	
  
Repositories
RDBMS
EDW
MPP
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Marts
Business
Analytics
Visualization
& Dashboards
Hadoop Driver: Cost OptimizationANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream	
   Web	
  	
  
&	
  Social	
  
GeolocaMon	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Existing Systems
ERP	
   CRM	
   SCM	
  
SOURCES
Archive Data off EDW
Move rarely used data to
Hadoop as active archive,
store more data longer
Offload costly ETL
Free your EDW to perform
high-value functions like
analytics & operations, not
ETL
Enrich the value of
your EDW
Use Hadoop to refine new
data sources, such as web
and machine data for new
analytical context
The Modern Data Architecture & Attunity
Provision,
Manage &
Monitor
APPLICATIONS	
  DATA	
  	
  SYSTEM	
  
OPERATIONAL	
  TOOLS	
  
DEV	
  &	
  DATA	
  TOOLS	
  
INFRASTRUCTURE	
  
Build & Test
On Premise or in
the Cloud
SOURCES	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
GeolocaCon	
  
Data	
  
Repositories
RDBMS
EDW
MPP
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Marts
Business
Analytics
Visualization
& Dashboards
Data
Integration
Attunity Corporate Overview
Overview	
  
§  Exchange	
  (Ticker): 	
  NASDAQ	
  (ATTU)	
  
§  Headquarters: 	
  Burlington,	
  MA	
  
§  Customers: 	
  >	
  2000	
  in	
  60	
  countries
	
  
	
  	
  
Making	
  Any	
  Data	
  Available	
  AnyMme,	
  Anywhere	
  
Analytics / BI
Distribution / DR
Archiving / Testing
We	
  Move	
  
the	
  Data	
  
that	
  Moves	
  
Our	
  
Customers’	
  
Business	
  
To Where the Data Needs to BeERP
CRM
POS
Legacy
Logs
Sensors
Files
9	
  
Data	
  
Warehouse	
  
Database	
   Cloud	
  
Hadoop	
  
Global	
  Offices	
  
To Use Data, You Must Move it!
10	
  
Data Needs to Be Moved to Be Useful
» 80%	
  of	
  the	
  work	
  that	
  data	
  
scien0sts	
  put	
  into	
  big	
  data	
  projects	
  
is	
  spent	
  on	
  data	
  integra-on	
  and	
  
resolving	
  data	
  quality	
  issues.	
  
Source:	
  “For	
  Big	
  Data	
  ScienCsts,	
  “Janitor	
  Work”	
  is	
  Key	
  Hurtle	
  to	
  Insights,”	
  by	
  Steve	
  Lohr,	
  New	
  York	
  
Times,	
  August	
  17,	
  2014	
  
Data Integration Remains a Major Challenge
1.  Long	
  rollout	
  
2.  Lots	
  of	
  personnel	
  
3.  Mixed	
  systems	
  
4.  Hard	
  to	
  maintain	
  
5.  Not	
  real-­‐Mme	
  
Turning Data Into Value
More Data
Less Time
Less Cost
13	
  
Data	
   Value	
  
The	
  A/unity	
  SoluMon	
  for	
  Big	
  Data	
  	
  
•  Fully automated, end-to-end. No scripting
•  Fast, high performance integration
•  Optimized for a broad range of platforms
•  Single pane of glass monitoring
•  Real-time change data capture
Attunity’s Big Solutions for Big Data
InformaMon	
  availability	
  soluMons	
  that	
  deliver	
  compeMMve	
  advantage	
  
14	
  
Business	
  Data	
  
(Oracle,	
  SQL	
  Server,	
  Teradata,	
  etc…)	
  
Machine	
  and	
  File	
  Data	
  
(logs,	
  sensors,	
  files,	
  etc…)	
  
ApplicaMon	
  Data	
  
(SAP,	
  Salesforce,	
  etc…)	
  
Cloud	
  Data	
  
(AWS	
  RDS,	
  Redshic,	
  etc…)	
  
15	
  
Attunity Offerings
15	
  
BUSINESS DATA
Attunity Replicate and Maestro
APPLICATION DATA
Attunity Gold Client
»  High-performance data replication
software to accelerate and reduce the
costs of distributing, sharing and
ensuring the availability of data
»  Software for SAP that reduces storage
requirements, improves the quality and
availability of test data, restores development
integrity, and helps ensure data security.
MACHINE AND FILE
Attunity RepliWeb, Replicate, and Maestro
»  Attunity Replicate, RepliWeb and
Maestro offer highly scalable replication
and synchronization for unstructured
files, machine data and Hadoop
CLOUD DATA
Attunity CloudBeam
»  Attunity CloudBeam is a SaaS platform
offering services for uploading and
synchronizing Big Data to, from, and between
cloud environments
‘Sqooping’ Big Data –
Loading Data the Hard Way
»  Apache Sqoop -– great tool, but not
enough
»  Designed for transferring bulk data between
Hadoop and databases
»  Not capable of CDC
»  Doesn't optimize network traffic
»  Script based interface importing data table
at the time
»  Limited number of standard database connectors
16	
  
Sqoop command line interface
Attunity Replicate Architecture
17	
  
»  Advanced	
  Monitoring	
  and	
  Control	
  
»  Click-­‐to-­‐Replicate	
  Design	
  
»  Fast	
  Loading	
  and	
  	
  
Real-­‐Time	
  CDC	
  
»  Broadest	
  Placorm	
  Support	
  
»  Non-­‐intrusive	
  Architecture	
  
Move	
  Any	
  Data,	
  Any	
  Time,	
  Any	
  Where.	
  
Use Case: Cable Provider
Modern Data Architecture with Hadoop
The Journey to the Data Lake
Bulk Load
Change Data
Click-­‐2-­‐Replicate	
  Design.	
  
Drag.	
  Drop.	
  Done.	
  
Databases	
  
Data	
  Feed	
  Sources	
  
CSV	
  
Data Refresh
Data Append
Finance	
  
Support	
  
MarkeMng	
  
Sales	
  
Engineering	
  
ODS	
   Business	
  Units	
  
Data Lake
Use Case: Managed Health Care –
Creating Golden Data Set
Ad-­‐hoc	
  	
  
AnalyMcs	
  
Bulk Load
Change Data
Click-­‐2-­‐Replicate	
  Design.	
  
Drag.	
  Drop.	
  Done.	
  
Databases	
  
Data	
  Feed	
  Sources	
  
CSV	
  
BI	
  	
  
ReporMng	
  
VisualizaMon	
  
&	
  AnalyMcs	
  
ODS	
  
Data Refresh
Data Append
ETL	
  
Staging
Area
Business	
  
TransformaMon	
  
Rules	
  Applied	
  
Use Case: Financial Services Institution –
Fraud Detection
Ad-­‐hoc	
  	
  
AnalyMcs	
  
Bulk Load
Change Data
Data	
  Feed	
  Sources	
  
BI	
  	
  
ReporMng	
  
VisualizaMon	
  
&	
  AnalyMcs	
  
ODS	
  
(PostgreSQL)	
  
Data Refresh
Data Append
ETL	
  
Staging
Area
Business	
  
TransformaMon	
  
Rules	
  Applied	
  
CDC	
  
ATTUNITY MAESTRO	
  
EDW/Data	
  
Mart	
  
	
  
 	
  	
  
Use Case: Sales Management Software
Data Consolidation
ATTUNITY MAESTRO	
  
MAESTRO NODE	
  MAESTRO NODE	
  MAESTRO NODE	
  
Headquarters	
  (HQ)	
  
Regional	
  Data	
  Center	
  
Data	
  From	
  SaaS	
  Customers	
  
21	
  
Replicate
Server	
  
California	
   New York	
  
Customer 1	
   Customer 2	
   Customer 3	
   Customer	
  4	
   Customer 5	
  
HQ	
  
…	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
Replicate
Server	
  
…	
  
Data Lake
Who’s Our Lucky Winner?
Next Steps
Download the Hortonworks Attunity Paper
“The Modern Data Architecture and
Automating Data Transfer”
Hortonworks.com/partner/Attunity/
Learn Hadoop – Download the Sandbox
Hortonworks.com/sandbox/
Learn More about Attunity & Hortonworks
Attunity.com/hortonworks
Hortonworks.com/partner/Attunity/
Thank You!
HDP delivers a completely open data platform
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized
architecture of core enterprise services, for any application and any data.
Completely Open
•  HDP incorporates every element required of an enterprise data platform: data storage, data access,
governance, security, operations
Hortonworks Data Platform 2.2
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
ApachePig
° °
° °
° ° °
° ° °
HDFS
(Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
ApacheHive
Cascading
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache
Zookeeper
Apache Oozie

More Related Content

What's hot (20)

PDF
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
PDF
Discover.hdp2.2.h base.final[2]
Hortonworks
 
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
 
PDF
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Hortonworks
 
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
PPTX
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
PDF
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
PDF
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
PDF
Apache Hadoop on the Open Cloud
Hortonworks
 
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
PPTX
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
PPTX
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
 
PPTX
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Apache Hadoop on the Open Cloud
Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 

Viewers also liked (20)

PPTX
Attunity Solutions for Teradata
Attunity
 
PDF
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
Hortonworks
 
PDF
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Hortonworks
 
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
PDF
Hortonworks and Voltage Security webinar
Hortonworks
 
PDF
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
 
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
PDF
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Hortonworks
 
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
PDF
Cloudian 451-hortonworks - webinar
Hortonworks
 
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
PPTX
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hortonworks
 
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
PDF
Hortonworks and HP Vertica Webinar
Hortonworks
 
PDF
Dataguise hortonworks insurance_feb25
Hortonworks
 
PDF
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Hortonworks
 
Attunity Solutions for Teradata
Attunity
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
Hortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
Hortonworks and Voltage Security webinar
Hortonworks
 
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
Cloudian 451-hortonworks - webinar
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks
 
Dataguise hortonworks insurance_feb25
Hortonworks
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Hortonworks
 
Ad

Similar to Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture (20)

PDF
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
PDF
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
PDF
the Data World Distilled
RTTS
 
PPTX
Sql server briefing sept
Mark Kromer
 
PDF
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PPTX
Accelerating Big Data Analytics
Attunity
 
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PDF
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
PDF
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
PPTX
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
PDF
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
 
PDF
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
the Data World Distilled
RTTS
 
Sql server briefing sept
Mark Kromer
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Accelerating Big Data Analytics
Attunity
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

  • 1. Accelerating Success with Rapid Data Integration for the Modern Data Architecture John Kreisa, Hortonworks Lawrence Schwartz, Attunity
  • 2. Speakers Lawrence  Schwartz,   A/unity   John  Kreisa,   Hortonworks  
  • 3. Customer Momentum •  230+ customers (as of Q3 2014) Hortonworks Data Platform •  Completely open multi-tenant platform for any app & any data. •  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. Partner for Customer Success •  Open source community leadership focus on enterprise needs •  Unrivaled world class support •  Founded in 2011 •  Original 24 architects, developers, operators of Hadoop from Yahoo! •  600+ Employees •  1000+ Ecosystem Partners Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
  • 4. Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 5. Hadoop emerged as foundation of new data architecture Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business •  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source Application Storage HDFS Batch Processing MapReduce
  • 6. The Modern Data Architecture Provision, Manage & Monitor APPLICATIONS  DATA    SYSTEM   OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   Build & Test On Premise or in the Cloud SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data   Repositories RDBMS EDW MPP HDP Governance &Integration Security Operations Data Access Data Management YARN Data Marts Business Analytics Visualization & Dashboards Data Marts Business Analytics Visualization & Dashboards
  • 7. Hadoop Driver: Cost OptimizationANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards HDP 2.2 ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot MPP In-Memory Clickstream   Web     &  Social   GeolocaMon   Sensor     &  Machine   Server     Logs   Unstructured   Existing Systems ERP   CRM   SCM   SOURCES Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer Offload costly ETL Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
  • 8. The Modern Data Architecture & Attunity Provision, Manage & Monitor APPLICATIONS  DATA    SYSTEM   OPERATIONAL  TOOLS   DEV  &  DATA  TOOLS   INFRASTRUCTURE   Build & Test On Premise or in the Cloud SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data   Repositories RDBMS EDW MPP HDP Governance &Integration Security Operations Data Access Data Management YARN Data Marts Business Analytics Visualization & Dashboards Data Marts Business Analytics Visualization & Dashboards Data Integration
  • 9. Attunity Corporate Overview Overview   §  Exchange  (Ticker):  NASDAQ  (ATTU)   §  Headquarters:  Burlington,  MA   §  Customers:  >  2000  in  60  countries       Making  Any  Data  Available  AnyMme,  Anywhere   Analytics / BI Distribution / DR Archiving / Testing We  Move   the  Data   that  Moves   Our   Customers’   Business   To Where the Data Needs to BeERP CRM POS Legacy Logs Sensors Files 9   Data   Warehouse   Database   Cloud   Hadoop   Global  Offices  
  • 10. To Use Data, You Must Move it! 10  
  • 11. Data Needs to Be Moved to Be Useful » 80%  of  the  work  that  data   scien0sts  put  into  big  data  projects   is  spent  on  data  integra-on  and   resolving  data  quality  issues.   Source:  “For  Big  Data  ScienCsts,  “Janitor  Work”  is  Key  Hurtle  to  Insights,”  by  Steve  Lohr,  New  York   Times,  August  17,  2014  
  • 12. Data Integration Remains a Major Challenge 1.  Long  rollout   2.  Lots  of  personnel   3.  Mixed  systems   4.  Hard  to  maintain   5.  Not  real-­‐Mme  
  • 13. Turning Data Into Value More Data Less Time Less Cost 13   Data   Value   The  A/unity  SoluMon  for  Big  Data     •  Fully automated, end-to-end. No scripting •  Fast, high performance integration •  Optimized for a broad range of platforms •  Single pane of glass monitoring •  Real-time change data capture
  • 14. Attunity’s Big Solutions for Big Data InformaMon  availability  soluMons  that  deliver  compeMMve  advantage   14   Business  Data   (Oracle,  SQL  Server,  Teradata,  etc…)   Machine  and  File  Data   (logs,  sensors,  files,  etc…)   ApplicaMon  Data   (SAP,  Salesforce,  etc…)   Cloud  Data   (AWS  RDS,  Redshic,  etc…)  
  • 15. 15   Attunity Offerings 15   BUSINESS DATA Attunity Replicate and Maestro APPLICATION DATA Attunity Gold Client »  High-performance data replication software to accelerate and reduce the costs of distributing, sharing and ensuring the availability of data »  Software for SAP that reduces storage requirements, improves the quality and availability of test data, restores development integrity, and helps ensure data security. MACHINE AND FILE Attunity RepliWeb, Replicate, and Maestro »  Attunity Replicate, RepliWeb and Maestro offer highly scalable replication and synchronization for unstructured files, machine data and Hadoop CLOUD DATA Attunity CloudBeam »  Attunity CloudBeam is a SaaS platform offering services for uploading and synchronizing Big Data to, from, and between cloud environments
  • 16. ‘Sqooping’ Big Data – Loading Data the Hard Way »  Apache Sqoop -– great tool, but not enough »  Designed for transferring bulk data between Hadoop and databases »  Not capable of CDC »  Doesn't optimize network traffic »  Script based interface importing data table at the time »  Limited number of standard database connectors 16   Sqoop command line interface
  • 17. Attunity Replicate Architecture 17   »  Advanced  Monitoring  and  Control   »  Click-­‐to-­‐Replicate  Design   »  Fast  Loading  and     Real-­‐Time  CDC   »  Broadest  Placorm  Support   »  Non-­‐intrusive  Architecture   Move  Any  Data,  Any  Time,  Any  Where.  
  • 18. Use Case: Cable Provider Modern Data Architecture with Hadoop The Journey to the Data Lake Bulk Load Change Data Click-­‐2-­‐Replicate  Design.   Drag.  Drop.  Done.   Databases   Data  Feed  Sources   CSV   Data Refresh Data Append Finance   Support   MarkeMng   Sales   Engineering   ODS   Business  Units   Data Lake
  • 19. Use Case: Managed Health Care – Creating Golden Data Set Ad-­‐hoc     AnalyMcs   Bulk Load Change Data Click-­‐2-­‐Replicate  Design.   Drag.  Drop.  Done.   Databases   Data  Feed  Sources   CSV   BI     ReporMng   VisualizaMon   &  AnalyMcs   ODS   Data Refresh Data Append ETL   Staging Area Business   TransformaMon   Rules  Applied  
  • 20. Use Case: Financial Services Institution – Fraud Detection Ad-­‐hoc     AnalyMcs   Bulk Load Change Data Data  Feed  Sources   BI     ReporMng   VisualizaMon   &  AnalyMcs   ODS   (PostgreSQL)   Data Refresh Data Append ETL   Staging Area Business   TransformaMon   Rules  Applied   CDC   ATTUNITY MAESTRO   EDW/Data   Mart    
  • 21.       Use Case: Sales Management Software Data Consolidation ATTUNITY MAESTRO   MAESTRO NODE  MAESTRO NODE  MAESTRO NODE   Headquarters  (HQ)   Regional  Data  Center   Data  From  SaaS  Customers   21   Replicate Server   California   New York   Customer 1   Customer 2   Customer 3   Customer  4   Customer 5   HQ   …   Replicate Server   Replicate Server   Replicate Server   Replicate Server   Replicate Server   …   Data Lake
  • 22. Who’s Our Lucky Winner?
  • 23. Next Steps Download the Hortonworks Attunity Paper “The Modern Data Architecture and Automating Data Transfer” Hortonworks.com/partner/Attunity/ Learn Hadoop – Download the Sandbox Hortonworks.com/sandbox/ Learn More about Attunity & Hortonworks Attunity.com/hortonworks Hortonworks.com/partner/Attunity/
  • 25. HDP delivers a completely open data platform Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data. Completely Open •  HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations Hortonworks Data Platform 2.2 YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS Apache Falcon ApacheHive Cascading ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm Apache Sqoop Apache Flume Apache Kafka SECURITY Apache Ranger Apache Knox Apache Falcon OPERATIONS Apache Ambari Apache Zookeeper Apache Oozie