SlideShare a Scribd company logo
Dell | Cloudera |Syncsort Data Warehouse Optimization –ETL Offload Reference
Architecture
Dell
Cloudera
Syncsort
Intel
Panel moderator
Armando Acosta, Dell
Armando Acosta
• Subject Matter Expert for Dell Big Data Solutions
• Product Manager for the Dell Hadoop Solutions
• Works with customers to transform IT into better business
outcomes
• Seventeen years in technology
Sean Anderson
Cloudera
Brandon Draeger
Intel
Mark Muncy
Syncsort
Panel introductions
Organizations actively using data grow 50% faster
50%
39% 42%
( 2 0 1 4 ) ( 2 0 1 5 )
The number of
organizations who
understand the
benefits of big data
grew slightly.
Older technology
can’t keep up
The ability to scale to support all data
and unpredictable workloads means
effective data management and data
integration are key priorities
Data silos hinder
decision-making
Need to analyze all data,
regardless of type or where it
resides – and apply to use cases
Determining the
value
IT/business alignment on
strategic business objectives
and use cases is critical to
achieving ROI from all data
There are challenges that must be addressed
Address data
challenges
holistically, yet
modularly
7
How data is moved and prepared
for analysis
The basics of big data and analytics
Where data is
analyzed
• Databases
• Social media
• Sensor data (IoT)
• Devices
• LOB applications
• Cloud
• External sources
Where data
originates
• Analytical engine
• Business
intelligence
• In-memory
computing
• Enterprise data
warehouse
Data integration, aggregation
and transformation
Sean Anderson
Sean Anderson, Cloudera
Product Marketing - IT Solutions at Cloudera
Sean is a tenured infrastructure scaling and cloud
strategy consultant with a strong focus on strategic
partnerships and innovative hybrid technology. He has
been a part of integral shifts in technology including the
rise of cloud computing, open source standardization,
and big data. Sean quickly became a go-to resource
and speaker for data specific workloads focusing on
technologies like Hadoop, MongoDB, Redis,
ElasticSearch, SQL, and Data Warehousing. At
Rackspace Hosting, Sean helped build and launch
open-source cloud platforms around Hadoop,
MongoDB, and Redis. Sean is currently marketing
director for IT Solutions at Cloudera; the pioneers of
Apache Hadoop.
Inefficient data workloads cost customers money
Frequent ETL breakdowns Long reporting wait times
Ad hoc access pressure on EDW Extreme query complexity
OPERATIONS
DATA
MANAGEMENT
BATCH REAL-TIME
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT SECURITY
FILESYSTEM RELATIONAL NoSQL
STORE
INTEGRATE
BATCH STREAM SQL SEARCH SDK
Cloudera Enterprise
Making Hadoop Fast, Easy, and Secure
A new kind of data
platform.
• One place for unlimited data
• Unified data access
Cloudera makes it:
• Fast for business
• Easy to manage
• Secure without compromise
Cloudera Navigator Optimizer
Unlock Your Best Hadoop Strategy, Instantly
Active Data
Optimization for
Hadoop to save you
time and money
• Instant workload
insights
• Intelligent optimization
guidance
• Reduce Hadoop
workload development
effort
Intel
Brandon Draeger
Director of Marketing and Business Development for Big
Data Solutions
Brandon is a Director of Marketing and Business
Development for Big Data Solutions at Intel and manages
the GTM relationship for Intel and Cloudera and their
shared partner ecosystem. Brandon has over 15 years of
experience in a variety of enterprise technology disciplines
and has held roles in engineering, product management,
and strategy at Dell, Symantec, and Dorado Software.
Customers Are Struggling
Traditional Tools Aren’t Working
Data integration and transformation
workloads consume as much as 80%
of EDW capacity
80%
Of all Data Warehouses are
performance and capacity
constrained –
70%
#1 Challenge
Organizations cite TCO as biggest
obstacle to data integration tools
Gartner: “The State of Data Warehousing
in 2014, June 19, 2014”
Gartner: “The State of Data Warehousing
in 2014, June 19, 2014”
Gartner: “The State of Data Warehousing
in 2014, June 19, 2014”
#1 Use Case for Hadoop
Data Warehouse Optimization - ETL Offload
Customer Challenge- Processing and storing ever-increasing data volumes with traditional enterprise data
warehouses and related data integration technology, and their legacy pricing models, is taxing stagnant IT
budgets
Practitioners who have shifted one or
more workloads from legacy data
warehouses or mainframes to Hadoop
The most popular workloads being
shifted are large-scale data
transformations
61%
Customers have
implemented
Hadoop
Syncsort Customer Survey 2014
15
Operational efficiency
Connect
Unify all data from disparate tables/sources to
reduce existing system load and data
transformation costs
Analyze
Deliver streamlined business reporting
even with existing analytical tools
Act
Utilize better, faster reporting for improved
data-driven decision making
Key use cases
• Data warehouse acceleration
• Log aggregation
• Data pipeline modernization
Data challenges for operational efficiency
Syncsort
Mark Muncy
Technical Product Marketing Manager – Big Data,
Syncsort
Mark Muncy leads Technical Product Marketing for
Syncsort’s Big Data portfolio, working with technical and
client-facing teams to deliver high-value solutions to the
most data intensive companies in the world. Mark
brings to his current role over a decade of hands-on
experience in data architecture and ETL development in
the gaming, data services, & financial services
industries.
Modern Data Pipeline
Traditional Data Pipeline
Too Many Workloads in the EDW
Modernize the Data Pipeline with Hadoop
Data Staging Tool
Extract & Load Data
Clean & Parse Data
Disparate
Data Sources
Enterprise data
warehouse + ETL
Data Transformation Jobs
Business Reporting Query
Perf
Capacity
The Results
Longer data
transformation job
times
Not meeting SLAs for
business reporting
Slow Ad Hoc Query
Too costly to scale
Disparate
Data Sources
Enterprise data warehouse
Business Reporting
Query
Perf
Capacity
The Results
Reduced data
transformation job
times
Improved SLAs for
business reporting
Fast Ad Hoc Query
Scales Economically
Hadoop + ETL
Data Transformation
Jobs Clean, Parse,
Transform
Syncsort DMX-h: A Complete Solution for Hadoop
Connect Transform Optimize
• Smarter Architecture – Engine runs natively within
MapReduce and Spark
• Smarter Connectivity – Connect streaming and batch
data sources across the organization, including
mainframe, NoSQL and everything in between.
• Smarter Development – GUI for developing &
maintaining Hadoop data pipeline
• Smarter Productivity – Use-case Accelerators to fast-
track development
• Enterprise Grade Solution – Integrated support for
Cloudera Navigator, Sentry, Kerberos and LDAP
Design Once, Deploy Anywhere
• Free users from underlying complexities of Hadoop
• Intelligent Execution dynamically optimizes the job
for any platform on premise or in the cloud
• Future-proof your applications!
19
3. Act2. Analyze1. ConnectSource
Operational efficiency architecture
ManagementServices Security Dell Financial ServicesInfrastructure
Operational data
sources
Enterprise data
warehouse
Relational
management
database
Data mart
Extract, translate,
and load
Sort
Aggregate
Group
Parse
Clean
Translate
Enterprise data
warehouse
Relational
management
database
Data mart
Business reporting
and query
Price
optimization
Improved
forecasting
Uptime
optimization
Accelerated
response
Faster
reporting
Improved
service
levels
Dell | Cloudera |
Syncsort |Intel
Microsoft APS, SAP HANA
Redeploying talent /
reducing staff costs
Entry level employee using the Dell |
Cloudera | Syncsort solution for Hadoop
could save 76.3% over three years
compared to a senior engineer using a
DIY, open source approach.
Save time and cost on Hadoop ETL jobs.
Expert Cost (contractor) $559.298
Expert Cost (employee) $279,149
Beginner Cost
$132,326
Total administrative costs over three years to design 4 ETL jobs per month.
Entry Level vs. Senior
Engineer
Time to complete ETL jobs
comparing experience engineers
(green) to new hires (blue)
Complete Hadoop jobs faster
30 min, 11 sec
36 min, 39 sec
4 min, 48 sec
5 min, 51 sec
6 min, 15 sec
15 min, 45 sec
Data validation and pre-processing
Fact dimension load with type 2 SCD
Vendor mainframe file integration
60.3%
less time
17.6%
less time
17.9%
less time
Save 53.7%
in time
Using the Dell |
Cloudera |
Syncsort solution
for Hadoop, the
entry-level
technician
developed and
deployed Hadoop
ETL jobs in 53.7%
less time
Reclaim days of valuable time
Fact dimension load
with type 2 SCD
Data validation
and
pre-processing
Vendor
mainframe
file
integration
Load Validat Int
8.3 Days
3.8 Days
Panel Q&A
Listen to this Webcast On-
Demand
Including Panel & Participant Q&A
http://bit.ly/1Rtk2OE
For additional information:
Dell.com/Hadoop
Hadoop@Dell.com
Thank you.

More Related Content

What's hot (20)

PDF
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
PDF
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
PDF
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
PDF
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Precisely
 
PDF
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Mark Hewitt
 
PDF
The principles of the business data lake
Capgemini
 
PPTX
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Precisely
 
PPTX
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
PPTX
Enterprise Data Management
Syed Jahanzaib Bin Hassan - JBH Syed
 
PDF
Building Your Enterprise Data Marketplace with DMX-h
Precisely
 
PDF
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
CCG
 
PPTX
The Path to Data and Analytics Modernization
Analytics8
 
PDF
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
PDF
Agile NoSQL With XRX
DATAVERSITY
 
PPTX
Accelerate Innovation with Databricks and Legacy Data
Precisely
 
PPTX
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
PPTX
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Precisely
 
PDF
MDM for Customer data with Talend
Jean-Michel Franco
 
PDF
The technology of the business data lake
Capgemini
 
PDF
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
DATAVERSITY
 
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Precisely
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Mark Hewitt
 
The principles of the business data lake
Capgemini
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Precisely
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
Enterprise Data Management
Syed Jahanzaib Bin Hassan - JBH Syed
 
Building Your Enterprise Data Marketplace with DMX-h
Precisely
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
CCG
 
The Path to Data and Analytics Modernization
Analytics8
 
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Agile NoSQL With XRX
DATAVERSITY
 
Accelerate Innovation with Databricks and Legacy Data
Precisely
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Precisely
 
MDM for Customer data with Talend
Jean-Michel Franco
 
The technology of the business data lake
Capgemini
 
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
DATAVERSITY
 

Similar to Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop (20)

PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Precisely
 
PDF
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Principled Technologies
 
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Steven Totman
 
PDF
The Path to Digital Transformation
Precisely
 
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Cloudera, Inc.
 
PDF
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Principled Technologies
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Precisely
 
PPTX
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PPTX
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Precisely
 
PDF
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
PPTX
Simplifying and Future-Proofing Hadoop
Precisely
 
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
PPTX
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
PDF
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
PDF
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Precisely
 
PDF
Hadoop Perspectives for 2017
Precisely
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Precisely
 
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Principled Technologies
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Steven Totman
 
The Path to Digital Transformation
Precisely
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Cloudera, Inc.
 
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Principled Technologies
 
Data Warehouse Optimization
Cloudera, Inc.
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Precisely
 
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Precisely
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
Simplifying and Future-Proofing Hadoop
Precisely
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Precisely
 
Hadoop Perspectives for 2017
Precisely
 
Ad

More from Precisely (20)

PDF
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Precisely
 
PDF
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Precisely
 
PDF
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
Precisely
 
PDF
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
PDF
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
PDF
The 2025 Guide on What's Next for Automation.pdf
Precisely
 
PDF
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Precisely
 
PDF
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Precisely
 
PDF
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Precisely
 
PDF
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Precisely
 
PDF
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
Precisely
 
PDF
The Changing Compliance Landscape in 2025.pdf
Precisely
 
PDF
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
PDF
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
PDF
Unlocking the Power of Trusted Data for AI, Analytics, and Business Growth.pdf
Precisely
 
PDF
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
PDF
End-to-end process automation: Simplifying SAP master data with low-code/no-c...
Precisely
 
PDF
Optimizing Your IBM i Availability: Storage vs. Software Replication.pdf
Precisely
 
PDF
AI You Can Trust - The Role of Data Integrity in AI-Readiness.pdf
Precisely
 
PDF
Top Tips to Get Your Data AI-Ready‎ ‎ ‎‎ ‎
Precisely
 
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Precisely
 
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Precisely
 
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
Precisely
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
The 2025 Guide on What's Next for Automation.pdf
Precisely
 
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Precisely
 
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Precisely
 
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Precisely
 
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Precisely
 
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
Precisely
 
The Changing Compliance Landscape in 2025.pdf
Precisely
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
Unlocking the Power of Trusted Data for AI, Analytics, and Business Growth.pdf
Precisely
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
End-to-end process automation: Simplifying SAP master data with low-code/no-c...
Precisely
 
Optimizing Your IBM i Availability: Storage vs. Software Replication.pdf
Precisely
 
AI You Can Trust - The Role of Data Integrity in AI-Readiness.pdf
Precisely
 
Top Tips to Get Your Data AI-Ready‎ ‎ ‎‎ ‎
Precisely
 
Ad

Recently uploaded (20)

PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Executive Business Intelligence Dashboards
vandeslie24
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Import Data Form Excel to Tally Services
Tally xperts
 

Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop

  • 1. Dell | Cloudera |Syncsort Data Warehouse Optimization –ETL Offload Reference Architecture Dell Cloudera Syncsort Intel
  • 2. Panel moderator Armando Acosta, Dell Armando Acosta • Subject Matter Expert for Dell Big Data Solutions • Product Manager for the Dell Hadoop Solutions • Works with customers to transform IT into better business outcomes • Seventeen years in technology
  • 3. Sean Anderson Cloudera Brandon Draeger Intel Mark Muncy Syncsort Panel introductions
  • 4. Organizations actively using data grow 50% faster 50% 39% 42% ( 2 0 1 4 ) ( 2 0 1 5 ) The number of organizations who understand the benefits of big data grew slightly.
  • 5. Older technology can’t keep up The ability to scale to support all data and unpredictable workloads means effective data management and data integration are key priorities Data silos hinder decision-making Need to analyze all data, regardless of type or where it resides – and apply to use cases Determining the value IT/business alignment on strategic business objectives and use cases is critical to achieving ROI from all data There are challenges that must be addressed
  • 7. 7 How data is moved and prepared for analysis The basics of big data and analytics Where data is analyzed • Databases • Social media • Sensor data (IoT) • Devices • LOB applications • Cloud • External sources Where data originates • Analytical engine • Business intelligence • In-memory computing • Enterprise data warehouse Data integration, aggregation and transformation
  • 8. Sean Anderson Sean Anderson, Cloudera Product Marketing - IT Solutions at Cloudera Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, and big data. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like Hadoop, MongoDB, Redis, ElasticSearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped build and launch open-source cloud platforms around Hadoop, MongoDB, and Redis. Sean is currently marketing director for IT Solutions at Cloudera; the pioneers of Apache Hadoop.
  • 9. Inefficient data workloads cost customers money Frequent ETL breakdowns Long reporting wait times Ad hoc access pressure on EDW Extreme query complexity
  • 10. OPERATIONS DATA MANAGEMENT BATCH REAL-TIME PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT SECURITY FILESYSTEM RELATIONAL NoSQL STORE INTEGRATE BATCH STREAM SQL SEARCH SDK Cloudera Enterprise Making Hadoop Fast, Easy, and Secure A new kind of data platform. • One place for unlimited data • Unified data access Cloudera makes it: • Fast for business • Easy to manage • Secure without compromise
  • 11. Cloudera Navigator Optimizer Unlock Your Best Hadoop Strategy, Instantly Active Data Optimization for Hadoop to save you time and money • Instant workload insights • Intelligent optimization guidance • Reduce Hadoop workload development effort
  • 12. Intel Brandon Draeger Director of Marketing and Business Development for Big Data Solutions Brandon is a Director of Marketing and Business Development for Big Data Solutions at Intel and manages the GTM relationship for Intel and Cloudera and their shared partner ecosystem. Brandon has over 15 years of experience in a variety of enterprise technology disciplines and has held roles in engineering, product management, and strategy at Dell, Symantec, and Dorado Software.
  • 13. Customers Are Struggling Traditional Tools Aren’t Working Data integration and transformation workloads consume as much as 80% of EDW capacity 80% Of all Data Warehouses are performance and capacity constrained – 70% #1 Challenge Organizations cite TCO as biggest obstacle to data integration tools Gartner: “The State of Data Warehousing in 2014, June 19, 2014” Gartner: “The State of Data Warehousing in 2014, June 19, 2014” Gartner: “The State of Data Warehousing in 2014, June 19, 2014”
  • 14. #1 Use Case for Hadoop Data Warehouse Optimization - ETL Offload Customer Challenge- Processing and storing ever-increasing data volumes with traditional enterprise data warehouses and related data integration technology, and their legacy pricing models, is taxing stagnant IT budgets Practitioners who have shifted one or more workloads from legacy data warehouses or mainframes to Hadoop The most popular workloads being shifted are large-scale data transformations 61% Customers have implemented Hadoop Syncsort Customer Survey 2014
  • 15. 15 Operational efficiency Connect Unify all data from disparate tables/sources to reduce existing system load and data transformation costs Analyze Deliver streamlined business reporting even with existing analytical tools Act Utilize better, faster reporting for improved data-driven decision making Key use cases • Data warehouse acceleration • Log aggregation • Data pipeline modernization Data challenges for operational efficiency
  • 16. Syncsort Mark Muncy Technical Product Marketing Manager – Big Data, Syncsort Mark Muncy leads Technical Product Marketing for Syncsort’s Big Data portfolio, working with technical and client-facing teams to deliver high-value solutions to the most data intensive companies in the world. Mark brings to his current role over a decade of hands-on experience in data architecture and ETL development in the gaming, data services, & financial services industries.
  • 17. Modern Data Pipeline Traditional Data Pipeline Too Many Workloads in the EDW Modernize the Data Pipeline with Hadoop Data Staging Tool Extract & Load Data Clean & Parse Data Disparate Data Sources Enterprise data warehouse + ETL Data Transformation Jobs Business Reporting Query Perf Capacity The Results Longer data transformation job times Not meeting SLAs for business reporting Slow Ad Hoc Query Too costly to scale Disparate Data Sources Enterprise data warehouse Business Reporting Query Perf Capacity The Results Reduced data transformation job times Improved SLAs for business reporting Fast Ad Hoc Query Scales Economically Hadoop + ETL Data Transformation Jobs Clean, Parse, Transform
  • 18. Syncsort DMX-h: A Complete Solution for Hadoop Connect Transform Optimize • Smarter Architecture – Engine runs natively within MapReduce and Spark • Smarter Connectivity – Connect streaming and batch data sources across the organization, including mainframe, NoSQL and everything in between. • Smarter Development – GUI for developing & maintaining Hadoop data pipeline • Smarter Productivity – Use-case Accelerators to fast- track development • Enterprise Grade Solution – Integrated support for Cloudera Navigator, Sentry, Kerberos and LDAP Design Once, Deploy Anywhere • Free users from underlying complexities of Hadoop • Intelligent Execution dynamically optimizes the job for any platform on premise or in the cloud • Future-proof your applications!
  • 19. 19 3. Act2. Analyze1. ConnectSource Operational efficiency architecture ManagementServices Security Dell Financial ServicesInfrastructure Operational data sources Enterprise data warehouse Relational management database Data mart Extract, translate, and load Sort Aggregate Group Parse Clean Translate Enterprise data warehouse Relational management database Data mart Business reporting and query Price optimization Improved forecasting Uptime optimization Accelerated response Faster reporting Improved service levels Dell | Cloudera | Syncsort |Intel Microsoft APS, SAP HANA
  • 20. Redeploying talent / reducing staff costs Entry level employee using the Dell | Cloudera | Syncsort solution for Hadoop could save 76.3% over three years compared to a senior engineer using a DIY, open source approach. Save time and cost on Hadoop ETL jobs. Expert Cost (contractor) $559.298 Expert Cost (employee) $279,149 Beginner Cost $132,326 Total administrative costs over three years to design 4 ETL jobs per month.
  • 21. Entry Level vs. Senior Engineer Time to complete ETL jobs comparing experience engineers (green) to new hires (blue) Complete Hadoop jobs faster 30 min, 11 sec 36 min, 39 sec 4 min, 48 sec 5 min, 51 sec 6 min, 15 sec 15 min, 45 sec Data validation and pre-processing Fact dimension load with type 2 SCD Vendor mainframe file integration 60.3% less time 17.6% less time 17.9% less time
  • 22. Save 53.7% in time Using the Dell | Cloudera | Syncsort solution for Hadoop, the entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time Reclaim days of valuable time Fact dimension load with type 2 SCD Data validation and pre-processing Vendor mainframe file integration Load Validat Int 8.3 Days 3.8 Days
  • 24. Listen to this Webcast On- Demand Including Panel & Participant Q&A http://bit.ly/1Rtk2OE