1
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Turning Petabytes of Data into
Millions in Cost Recapture for the
World’s Biggest Retailers
Case Study with PRGX Global
Jonathon Whitton| Director Data Services
2
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
DRIVE CUSTOMER INSIGHTS
IMPROVE PRODUCT &
SERVICES EFFICIENCY LOWER BUSINESS RISKS
Data is Transforming Business
MODERNIZE ARCHITECTURE
3
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
OPERATIONS
DATA
MANAGEMENT
BATCH REAL-TIME
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT SECURITY
FILESYSTEM RELATIONAL NoSQL
STORE
INTEGRATE
BATCH STREAM SQL SEARCH SDK
Cloudera Enterprise
Making Hadoop Fast, Easy, and Secure for the Modernized Architecture
Hadoop is a new kind
of data platform.
• One place for unlimited data
• Unified data access
Cloudera makes it:
• Fast for business
• Easy to manage
• Secure without compromise
4
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Talend: A History of Delivering Innovation
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
(Revenue Growth)
Data
Integration
Master Data
Management
Data
Quality
Big Data
Application
Integration
Hadoop 2.0
Spark &
Cloud
• Our mission: enable the
data driven enterprise
• The most advanced Big
Data integration platform
• 10X faster development
• One solution for batch &
streaming
• Deploy machine learning in
minutes
Data
Preparation
1st to market for
Spark
1st on YARN &
Hadoop 2.0
1st to deliver a DI and
AI platform
1st ETL open source
tool
5
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
CLOUDERA ENTERPRISE
Talend and Cloudera
Enterprise Data Warehouse
Unstructured Data
Data Sources
Structured Data
Archive
Load
Applications
Reporting
BI System
Modeling
Model
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
APPLICATION
INTEGRATION
CLOUD
INTEGRATION
DATA
INTEGRATION
BIG DATA
INTEGRATION
MASTER DATA
MANAGEMENT
DATA
PREPARATION
Data Fabric
Connect
Serve
Serve
Ingest
Ingest
6
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• What is your top requirement for modernizing your big data infrastructure?
• Capture and secure all data
• Operational simplicity
• Variety of tools to interact with data
• Integration and portability across multiple platforms
• Timely availability of data
• Other
7
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
About me
 Jonathon Whitton, Director Data Services at PRGX
 Started working at PRGX in 2000
 BA from Duke University
 MBA from Kennesaw State University
 LinkedIn.com/in/whitton
 Twitter: @JonathonWhitton
8
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
 Global leader in accounts payable recovery audit
 Serve 75% of the Top 20 Global Retailers, most for more than 10 years
 Nearly half of recovery audit revenue from outside the U.S., in more than 30 countries
and across 5 continents
 ~1400 employees
9
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
What do we do for a living? …. We are in search of errors!
Aggregate & Manipulate
Client Data
• Receive over 2 million client
files annually
• 2.3 petabytes of data “live”
for auditing on average
• Data includes purchasing,
payment, receiving, deals,
point of sale, and emails
Mine Client Data for
Overpayments
• Utilize proprietary data
mining tools and techniques
• Fully document claim
Recover Overpayments
from Vendors
• Handle majority of vendor
communications
• Verify that deductions taken
or payments received
10
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
We are a data-driven business dealing with huge data sizes…
MONTHLY ANNUALLY200,000 2,400,000
# FILES FOR STRUCTURED DATA
MONTHLY 40TB
ANNUALLY 480TB
MONTHLY 200TB
ANNUALLY 2,375TB
INBOUND FOR STRUCTURED DATA
MONTHLY ANNUALLY12,500,000 150,000,000
# EMAILS UNSTRUCTURED DATA
11
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
… and huge data variety
CLIENT DATA ARRIVES
CHALLENGES METHOD RECEIVED
DELIVER TO AUDIT
TYPES OF FILES % BY FILES
EDI
XML
Flat file csv
Flat file delimited
database backups
spreadsheets
Pdfs
Tiff
Jpeg
Png
Prns
Emails
Microfiche
Proprietary formats
(FOR STRUCTURED DATA)
28% EBCDIC Flat
40% ASCII Flat
25% DB back-ups
7% Proprietary
Daily
Weekly
Bi-weekly
Monthly
Quarterly
Semi-annually
Annually
Unexpected
Under- estimated
New schema
Wrong schema
Tape
Hard drive
sFTP
Email
Other media – flash drives,
DVDs
DAILY WEEKLY BI-WEEKLY MONTHLY
QUARTERLY SEMI ANNUALLY ANNUALLY
12
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our Legacy solution was designed in the 1990s…
It had all the problems
that legacy systems
have
• High lead times
• Re-runs take a lot of
operations
• Highly labor intensive
operations
Tweaked and upgraded
over the years
but
architecturally
remained the same
Transformation process
Significant
re-engineering
zero downtime
Our solution had to be
flexible
and
lightning fast!
HiPer
High
Performance
Computing
13
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our evaluation was on potential “architectural stacks”
 Functional Requirements
 Information Security
Requirements and Privacy
Protection
 Performance Requirements
 Technical Requirements
 Vendor Information
Filter 1:
High level assessment focused on vendor’s overall company
strength, ability for the solution to scale, technology
feasibility in PRGX
19 potential vendors Short list of 6 vendors 5 POC vendors
Filter 2:
Based on willingness of vendor to work with us on a POC
and their flexibility
14
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Timeline of HiPer program
Q3 2013 –
Explore
technology that
can do Data
Transformation
much faster
Q4 2013 –
Ready with
problem
statement
Q1 2014 – Level
1 screening of
technology
stack options
Q2 2014 –Proof
of Concept
Q3 2014 –
Selection of
Hadoop
(Cloudera),
Talend stack
Q4 2014 – Pilot
Accounts;
Planning,
prioritization
for 2015
Q1 2015 –
Production
begins for top
15 accounts;
unstructured
analytics Proof
of concept
By Q2 2016 –
Remove
reliance on
AS400; pilot
accounts for
unstructured
analytics
By Q4 2016 –
60% accounts in
Production;
majority of
accounts for
unstructured
analytics
15
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for structured data processing
Data Acquisition and Load Data Transformation Auditing
Infrastructure management
Data Long Term Storage Hadoop Cluster
Data Set 1
Data Set 2
Data Set 3
Data Set 4
Data Set 5
Audit Tool Set 1
Audit Tool Set 2
Audit Tool Set 3
1. Talend and related
automation
Data Privacy, Security and Data Lineage
1. HDFS – for long term data storage
2. Batch processing – Apache Hive and Apache Spark
3. Investigative queries (QC) – Apache Impala
4. Output to RDBMS – Apache Scoop and Talend
1. Cloudera Manager
1. Cloudera Navigator
2. Kerberos
1. Continue and invest in Legacy
technology (MSSQL)
2. Explore use of Hadoop
oriented BI tools
16
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for unstructured data processing
Data Acquisition and Load Data Transformation Auditing
Infrastructure management
Data
(pst, nsf, …)
Email Auditing Tool
Data Privacy, Security and Data Lineage
1. Cloudera Manager
1. Kerberos
2. Mutiple Solr indexes / Hbase Tables
EML Files
EML files
EML files
Long term
storage - HBASE
Search Index -
SOLR
1. In-house automation logic in .Net
to convert files to eml files
2. Load to Hbase via Apache Thrift
3. Apache Tika to Read Email Data
(Headers, Body, Attachments).
1. Hbase – email storage and updates
to individual records
2. NGData Keystore indexer (Lily) to
update Solr in near real time
3. SOLR – index to query emails
4. SOLRNET - .Net Api to SOLR querying
5. Apache Thrift - Update to HBASE for
audit findings
1. Refreshed .Net
application to read from
SOLR/update to HBASE
17
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our business?
Load and Prepare Data Audit
Load and
Prepare Data
Audit
X weeks
X/2 weeks
More time for Auditing
Limited time to process and audit data
HiPer’s Impact
Delivery Cycle
Structured data processing
 Improved performance in data delivery for structured
and unstructured data about 9-10 times the original
timelines via Hive
 Starting to use Spark
18
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our business?
Faster analysis in Hadoop
 Point of Sales (POS) data was aggregated
 list of stores
 number of transactions
 sum of quantity
 Would have run for hours in our AS400
environment
 2014 would not have been readily available before
Hadoop’s lower cost of storage made it affordable
for us to keep more data online
 No need to go back to our archive
 No need to request a restore
2015 data
13 months: 20141201 to 20151231
Row count: 2,048,666,122
Returned: 1385 rows returned (1 per store)
Returned in: 90 seconds
2014 data
13 months: 20131201 to 20141231
Row count: 2,159,626,445
Returned: 1365 rows returned (1 per store)
Returned in: 90 seconds
Field count in source POS data was 14
19
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Talend impact our development?
Hive QL hand coding
Talend Big Data 40% faster
Move on to the next item
this much faster
Development Cycle
Development Cycle
 Reduced developing processes in Hadoop by at least
40%.
 With over 200 distinct process to get through this
year, the time savings is very meaningful in achieving
our goals
 No more hand coding Hive QL
 Talend has the logic de-coupled from the code that is
executing on the Hadoop cluster
 “Upgrade” to Spark via dropdown and then
addressing limited component changes that are
clearly identified
 “Upgrade” to the next big thing
20
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Cloudera Hadoop and Talend impact the future of our business
Structured data processing
 Support for “machine-learning” techniques in the email and buyer pulls significantly
enhancing productivity for audits
Expanded focus
 Key enabler to mine large volumes of data from our customers
 Ability to re-run and experiment on large sets of data
 Framework for industry standard Master Data
Infrastructure
 Backup and archival solution for the future
 Begin standardization and reusability of processes; eliminate person dependency
21
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Key success factors
 Create the business case…for multiple years, with quick wins
– no IT programs succeed without business backing
 Plan the program – for multiple years
 Architect it right
• Architect keeping your current process and tools in mind
• Architect it for different uses
• Architect it for time!
 Sell your program…you need champions everywhere
– from the CEO/Board to your business, to marketing to even HR
 Find the right partner; develop the partnership
22
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• Which Hadoop tools do you foresee using the most to help with your big data
challenges?
• Data ingestion tools
• Search
• Impala
• Kudu
• Spark
23
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Q&A
24
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Call to Action
Test Drive Talend Software Define a Big Data ProjectLearn about Hadoop with
Cloudera
25
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Why Cloudera and Talend?
 Scalability
 Performance
 Flexibility
 Stability
 Support
26
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Thank you

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers

  • 1.
    1 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Turning Petabytes of Data into Millions in Cost Recapture for the World’s Biggest Retailers Case Study with PRGX Global Jonathon Whitton| Director Data Services
  • 2.
    2 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved DRIVE CUSTOMER INSIGHTS IMPROVE PRODUCT & SERVICES EFFICIENCY LOWER BUSINESS RISKS Data is Transforming Business MODERNIZE ARCHITECTURE
  • 3.
    3 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved OPERATIONS DATA MANAGEMENT BATCH REAL-TIME PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT SECURITY FILESYSTEM RELATIONAL NoSQL STORE INTEGRATE BATCH STREAM SQL SEARCH SDK Cloudera Enterprise Making Hadoop Fast, Easy, and Secure for the Modernized Architecture Hadoop is a new kind of data platform. • One place for unlimited data • Unified data access Cloudera makes it: • Fast for business • Easy to manage • Secure without compromise
  • 4.
    4 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Talend: A History of Delivering Innovation 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 (Revenue Growth) Data Integration Master Data Management Data Quality Big Data Application Integration Hadoop 2.0 Spark & Cloud • Our mission: enable the data driven enterprise • The most advanced Big Data integration platform • 10X faster development • One solution for batch & streaming • Deploy machine learning in minutes Data Preparation 1st to market for Spark 1st on YARN & Hadoop 2.0 1st to deliver a DI and AI platform 1st ETL open source tool
  • 5.
    5 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved CLOUDERA ENTERPRISE Talend and Cloudera Enterprise Data Warehouse Unstructured Data Data Sources Structured Data Archive Load Applications Reporting BI System Modeling Model OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE APPLICATION INTEGRATION CLOUD INTEGRATION DATA INTEGRATION BIG DATA INTEGRATION MASTER DATA MANAGEMENT DATA PREPARATION Data Fabric Connect Serve Serve Ingest Ingest
  • 6.
    6 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • What is your top requirement for modernizing your big data infrastructure? • Capture and secure all data • Operational simplicity • Variety of tools to interact with data • Integration and portability across multiple platforms • Timely availability of data • Other
  • 7.
    7 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved About me  Jonathon Whitton, Director Data Services at PRGX  Started working at PRGX in 2000  BA from Duke University  MBA from Kennesaw State University  LinkedIn.com/in/whitton  Twitter: @JonathonWhitton
  • 8.
    8 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved  Global leader in accounts payable recovery audit  Serve 75% of the Top 20 Global Retailers, most for more than 10 years  Nearly half of recovery audit revenue from outside the U.S., in more than 30 countries and across 5 continents  ~1400 employees
  • 9.
    9 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved What do we do for a living? …. We are in search of errors! Aggregate & Manipulate Client Data • Receive over 2 million client files annually • 2.3 petabytes of data “live” for auditing on average • Data includes purchasing, payment, receiving, deals, point of sale, and emails Mine Client Data for Overpayments • Utilize proprietary data mining tools and techniques • Fully document claim Recover Overpayments from Vendors • Handle majority of vendor communications • Verify that deductions taken or payments received
  • 10.
    10 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved We are a data-driven business dealing with huge data sizes… MONTHLY ANNUALLY200,000 2,400,000 # FILES FOR STRUCTURED DATA MONTHLY 40TB ANNUALLY 480TB MONTHLY 200TB ANNUALLY 2,375TB INBOUND FOR STRUCTURED DATA MONTHLY ANNUALLY12,500,000 150,000,000 # EMAILS UNSTRUCTURED DATA
  • 11.
    11 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved … and huge data variety CLIENT DATA ARRIVES CHALLENGES METHOD RECEIVED DELIVER TO AUDIT TYPES OF FILES % BY FILES EDI XML Flat file csv Flat file delimited database backups spreadsheets Pdfs Tiff Jpeg Png Prns Emails Microfiche Proprietary formats (FOR STRUCTURED DATA) 28% EBCDIC Flat 40% ASCII Flat 25% DB back-ups 7% Proprietary Daily Weekly Bi-weekly Monthly Quarterly Semi-annually Annually Unexpected Under- estimated New schema Wrong schema Tape Hard drive sFTP Email Other media – flash drives, DVDs DAILY WEEKLY BI-WEEKLY MONTHLY QUARTERLY SEMI ANNUALLY ANNUALLY
  • 12.
    12 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Our Legacy solution was designed in the 1990s… It had all the problems that legacy systems have • High lead times • Re-runs take a lot of operations • Highly labor intensive operations Tweaked and upgraded over the years but architecturally remained the same Transformation process Significant re-engineering zero downtime Our solution had to be flexible and lightning fast! HiPer High Performance Computing
  • 13.
    13 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Our evaluation was on potential “architectural stacks”  Functional Requirements  Information Security Requirements and Privacy Protection  Performance Requirements  Technical Requirements  Vendor Information Filter 1: High level assessment focused on vendor’s overall company strength, ability for the solution to scale, technology feasibility in PRGX 19 potential vendors Short list of 6 vendors 5 POC vendors Filter 2: Based on willingness of vendor to work with us on a POC and their flexibility
  • 14.
    14 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Timeline of HiPer program Q3 2013 – Explore technology that can do Data Transformation much faster Q4 2013 – Ready with problem statement Q1 2014 – Level 1 screening of technology stack options Q2 2014 –Proof of Concept Q3 2014 – Selection of Hadoop (Cloudera), Talend stack Q4 2014 – Pilot Accounts; Planning, prioritization for 2015 Q1 2015 – Production begins for top 15 accounts; unstructured analytics Proof of concept By Q2 2016 – Remove reliance on AS400; pilot accounts for unstructured analytics By Q4 2016 – 60% accounts in Production; majority of accounts for unstructured analytics
  • 15.
    15 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for structured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data Long Term Storage Hadoop Cluster Data Set 1 Data Set 2 Data Set 3 Data Set 4 Data Set 5 Audit Tool Set 1 Audit Tool Set 2 Audit Tool Set 3 1. Talend and related automation Data Privacy, Security and Data Lineage 1. HDFS – for long term data storage 2. Batch processing – Apache Hive and Apache Spark 3. Investigative queries (QC) – Apache Impala 4. Output to RDBMS – Apache Scoop and Talend 1. Cloudera Manager 1. Cloudera Navigator 2. Kerberos 1. Continue and invest in Legacy technology (MSSQL) 2. Explore use of Hadoop oriented BI tools
  • 16.
    16 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for unstructured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data (pst, nsf, …) Email Auditing Tool Data Privacy, Security and Data Lineage 1. Cloudera Manager 1. Kerberos 2. Mutiple Solr indexes / Hbase Tables EML Files EML files EML files Long term storage - HBASE Search Index - SOLR 1. In-house automation logic in .Net to convert files to eml files 2. Load to Hbase via Apache Thrift 3. Apache Tika to Read Email Data (Headers, Body, Attachments). 1. Hbase – email storage and updates to individual records 2. NGData Keystore indexer (Lily) to update Solr in near real time 3. SOLR – index to query emails 4. SOLRNET - .Net Api to SOLR querying 5. Apache Thrift - Update to HBASE for audit findings 1. Refreshed .Net application to read from SOLR/update to HBASE
  • 17.
    17 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Load and Prepare Data Audit Load and Prepare Data Audit X weeks X/2 weeks More time for Auditing Limited time to process and audit data HiPer’s Impact Delivery Cycle Structured data processing  Improved performance in data delivery for structured and unstructured data about 9-10 times the original timelines via Hive  Starting to use Spark
  • 18.
    18 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Faster analysis in Hadoop  Point of Sales (POS) data was aggregated  list of stores  number of transactions  sum of quantity  Would have run for hours in our AS400 environment  2014 would not have been readily available before Hadoop’s lower cost of storage made it affordable for us to keep more data online  No need to go back to our archive  No need to request a restore 2015 data 13 months: 20141201 to 20151231 Row count: 2,048,666,122 Returned: 1385 rows returned (1 per store) Returned in: 90 seconds 2014 data 13 months: 20131201 to 20141231 Row count: 2,159,626,445 Returned: 1365 rows returned (1 per store) Returned in: 90 seconds Field count in source POS data was 14
  • 19.
    19 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Talend impact our development? Hive QL hand coding Talend Big Data 40% faster Move on to the next item this much faster Development Cycle Development Cycle  Reduced developing processes in Hadoop by at least 40%.  With over 200 distinct process to get through this year, the time savings is very meaningful in achieving our goals  No more hand coding Hive QL  Talend has the logic de-coupled from the code that is executing on the Hadoop cluster  “Upgrade” to Spark via dropdown and then addressing limited component changes that are clearly identified  “Upgrade” to the next big thing
  • 20.
    20 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Cloudera Hadoop and Talend impact the future of our business Structured data processing  Support for “machine-learning” techniques in the email and buyer pulls significantly enhancing productivity for audits Expanded focus  Key enabler to mine large volumes of data from our customers  Ability to re-run and experiment on large sets of data  Framework for industry standard Master Data Infrastructure  Backup and archival solution for the future  Begin standardization and reusability of processes; eliminate person dependency
  • 21.
    21 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Key success factors  Create the business case…for multiple years, with quick wins – no IT programs succeed without business backing  Plan the program – for multiple years  Architect it right • Architect keeping your current process and tools in mind • Architect it for different uses • Architect it for time!  Sell your program…you need champions everywhere – from the CEO/Board to your business, to marketing to even HR  Find the right partner; develop the partnership
  • 22.
    22 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • Which Hadoop tools do you foresee using the most to help with your big data challenges? • Data ingestion tools • Search • Impala • Kudu • Spark
  • 23.
    23 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Q&A
  • 24.
    24 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Call to Action Test Drive Talend Software Define a Big Data ProjectLearn about Hadoop with Cloudera
  • 25.
    25 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Why Cloudera and Talend?  Scalability  Performance  Flexibility  Stability  Support
  • 26.
    26 © Cloudera, Inc.All rights reserved. © PRGX Global, Inc. All Rights Reserved Thank you