SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Building a Modern Analytic
Database with Cloudera 5.8
Justin Erickson | Sr Director of Product | Cloudera
Andy Frey | CIO | Marketing Associates
2© Cloudera, Inc. All rights reserved.
Agenda
• Building a Modern Analytic Database with Hadoop
• Key Use Cases Enabled
• What’s New with Cloudera 5.8
• Marketing Associates Customer Case Study
• What’s Next?
3© Cloudera, Inc. All rights reserved.
Common Application Patterns
Operational Efficiency New Business Value
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
Process data, develop &
serve predictive models
Data
Engineering &
Science
ELT, reporting, exploratory
business intelligence
Analytic
Database
Build data-driven
applications
to deliver real-time insights.
Operational
Database
4© Cloudera, Inc. All rights reserved.
Analytic
Database
More data of all types is being
tapped for analytics, across
environments
Self-Service BI & Data
Open up new possibilities
for real-time insights as
data changes
Real-Time Analysis
BI & analytics are critical but
only tell part of the story. Get
more value by sharing data
across workloads
Converged Workloads
5© Cloudera, Inc. All rights reserved.
Key Use Cases
EDW
Optimization
Data
Preparation
Self-Service BI
& Exploration
Use your EDW more
efficiently by offloading
workloads to Hadoop
Fast, flexible ETL over large
data volumes, so data is always
ready for your business
Fastest time-to-insights with a modern
analytic database designed with
Hadoop’s flexibility and agility
6© Cloudera, Inc. All rights reserved.
Cloudera’s Analytic Database Solution
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
Identify, offload, &
optimize workloads to
Hadoop
Navigator
Optimizer
Intelligent SQL editor
Hue
Audit, lineage,
encryption, key
management, & policy
lifecycles
Navigator
Integration with the
leading BI tools
BI Partners
Interactive query engine
for BI & SQL analytics
Impala
Large-scale ETL & batch
processing engine
Hive-on-
Spark
7© Cloudera, Inc. All rights reserved.
ETL & Data Preparation
• Flexible & Scalable
• Process larger data volumes, of
any type
• Fastest Data Processing
• Distributed processing and
best-of-breed technologies for
the fastest performance
• Minimize Data Movement
• Prepared data immediately
available for analytics with
shared storage and metadata
8© Cloudera, Inc. All rights reserved.
Self-Service BI & Exploratory Analytics
• Self-Service Data Agility
• No rigid data modeling encumbrances for agile acquisition
• Iteratively analyze and flexibly model
• Self-Service Exploratory Analytics
• Interactive responses for iterative exploration
• Confidently handle all BI and SQL users
• Cost-Effective Scalability with Users/Data
• Easily add nodes to handle more data and users
• Leverage the full potential of available data
• Productively Use Existing Tools and Skills
• Integration with all leading BI tools & compatible analytic
SQL language
• Metadata and lineage for easy data discovery
• Intelligent SQL editor for greater developer productivity
9© Cloudera, Inc. All rights reserved.
Optimize the Enterprise Data Warehouse
• Decrease Storage Costs
• Focus on high-value reporting data in
the EDW
• Keep More/All Data Online
• Unlimited scale keeps data accessible
and out of archive
• Improve Performance
• Eliminate contention and meet SLAs for
routine reporting
• Get New Insights
• Enable ad hoc and exploratory analytics
Siemens’ TCO Assessment (cost/TB)
10© Cloudera, Inc. All rights reserved.
What’s New in Cloudera 5.8
11© Cloudera, Inc. All rights reserved.
Advancements with Cloudera 5.8
Impala Hue
Navigator
Optimizer
• Cloud-Native:
• Read/write directly
from Amazon S3
• Performance:
• >10x faster
performance on
secure clusters
• Data Discovery:
• Preview, tag, search, pin
tables in browser
• Query Design Assistance:
• Autocomplete of tables,
columns, syntax
• Efficient troubleshooting
• Collaboration & Sharing:
• Save & share queries
with peers
• Set permissions directly
on results
• Now GA!
• Ease offloading path to
Hadoop
• Active Data
Optimization to enable
peak performance for
Hive and Impala
12© Cloudera, Inc. All rights reserved.
Self-Service Data Discovery & BI
at Marketing Associates
Andy Frey
13© Cloudera, Inc. All rights reserved.
About Me – Andy Frey
From Assembler to Ajax, Modem to Mobile, and Mainframe to Cloud, Andy Frey,
developed his deep knowledge as a technologist, and CIO at leading national
corporations such as GAB Robins, Compuware, J. Walter Thompson, Coolfire and
now Marketing Associates, providing Fortune 100 corporations with
technologically advanced enterprise solutions.
14© Cloudera, Inc. All rights reserved.
Introducing Magnify and Marketing Associates
• Magnify Analytic Solutions — a wholly-owned division of Detroit, Michigan-based
Marketing Associates serving primarily Fortune 100 clients — uses technology-driven
data analysis to offer clients a range of informed business services that increase
profitability through its four lines of service: business intelligence, digital intelligence,
credit risk management, and marketing analytics.
• Established in 1967 Marketing Associates is a full-service, technology enabled marketing
services company headquartered in Detroit, Michigan with offices in Wilmington,
Delaware and Charlotte, North Carolina. MA offers private and public cloud hosting,
custom web development, and data transformation among its’ IT based services.
Offering Cloudera Hadoop IaaS and experienced Data Scientists
15© Cloudera, Inc. All rights reserved.
Different Challenges for Different Clients
The B2C Challenge
• Previously using expensive RDBMS systems to deliver B2C marketing contests and
product giveaways. Up to 150 in a year.
• Huge spikes in web event data posed challenges. 200,000 hits in first minute for
popular brands’ campaigns.
• Cost to license for biggest spike made projects unprofitable.
• Also needed to monitor and manipulate massive amounts of data in real time.
RDBMS could not respond adequately during massive data intake during
campaign run. “When has a campaign reached its limit? Has total supply of
product been allocated?”
16© Cloudera, Inc. All rights reserved.
Different Challenges for Different Clients
The CRM Challenge
• Another project for a large client involved managing a repository of customer
data from multiple sources. The magnitude was vast, data was multi-structured
and new sources were being added on a regular basis.
• Initially executed using 4 relational databases, query times slowed and costs
soared.
• Difficulty merging unstructured data from multiple sources using traditional
RDBMS.
• Deployment of prominent SQL RDBMS estimated @ $5 million cost (approx. 150
terabyte).
17© Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
• Handle huge spikes in web event data.
• Manage and manipulate massive data volumes in
real-time.
• Scalability and performance.
• Ability to skill transfer from current SQL based
programming team.
• Reduce costs.
18© Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
• Handle huge spikes in web event data.
• Manage and manipulate massive data volumes in
real-time.
• Scalability and performance.
• Ability to skill transfer from current SQL based
programming team.
• Reduce costs.
Considered various offerings:
• Considered SQL Server (discarded due to cost).
• Knew Hadoop could be the solution and started
looking at commercial implementations.
• Considered non-commercial & shorted listed two
Hadoop vendors: Cloudera & Hortonworks.
• Determined non-commercial too risky, too
burdensome – left it to the experts.
19© Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
• Handle huge spikes in web event data.
• Manage and manipulate massive data volumes in
real-time.
• Scalability and performance.
• Ability to skill transfer from current SQL based
programming team.
• Reduce costs.
Considered various offerings:
• Considered SQL Server (discarded due to cost).
• Knew Hadoop could be the solution and started
looking at commercial implementations.
• Considered non-commercial & shorted listed two
Hadoop vendors: Cloudera & Hortonworks.
• Determined non-commercial too risky, too
burdensome – left it to the experts.
• Launched June 2014
• Why Hadoop: Cost, Tech Requirements, Data Size
• Why Cloudera: Most mature solution with better overall enterprise toolset; Cloudera Team
Decision
20© Cloudera, Inc. All rights reserved.
Solution
• Hadoop Platform:
• Cloudera Enterprise
• Hadoop Components:
• Apache Flume, Apache Sqoop, Apache Hive, MapReduce, Apache Impala (incubating),
Hue, Cloudera Manager
• Third-Party BI & Analytic Tools:
• D3.js, SAS, Tableau, R, Angoss
• Security Tools:
• Kerberos, Apache Sentry, Cloudera Navigator
21© Cloudera, Inc. All rights reserved.
Solution: Self-Service Data Discovery & BI
• Self-service data discovery capabilities allow us to eliminate the need for distribution of
multiple Excel reports instead allowing our clients to interact directly with Hadoop.
• Security enhanced as the need for distribution of Excel reports via email went away.
• Use of Tableau to run Impala queries produces real-time reporting resulting in significant
value add and convenience for our clients.
• Offers scalability and flexibility to accommodate diverse and growing client demands.
• Allows us to scale our web event product giveaways.
• Accommodates the addition of new data sources.
• Easily add nodes to avoid potential performance bottlenecks.
22© Cloudera, Inc. All rights reserved.
Why We Chose Cloudera
• Cloudera Manager became a major differentiator.
• Made cluster management easy
• User friendly = reduced learning curve
• Chose Impala for its real-time query performance.
• Proven Cloudera innovation and zeal to maintain an enterprise class solution by
offering new tools and functions while maintaining/supporting the Apache
project.
• Cloudera appeared to be the prominent choice of large Hadoop installs in the
Fortune 500. Best IT Analyst rating.
• Impressed with Cloudera team before purchase.
23© Cloudera, Inc. All rights reserved.
Benefits & Impact
• All-inclusive Cloudera Enterprise costs less than the required relational database licenses alone -
Over 90% cost reduction.
• Other benefits - cheaper hardware and easier to manage.
• Cloudera Navigator provides a single interface to locate and classify data, audit who is accessing
what data, and protect the data with centralized key management.
• Critical tool when handling PII and other sensitive data. Comprehensive audit trail allows for
easy monitoring of PII data access.
• Allows us to satisfy strict security compliance regulations with ease.
• Cloudera Professional Services are knowledgeable, responsive, and help establish best practices
for our internal development team. They helped us get it right the second time.
Any time we had a crisis they were there to help
Why we are glad we chose Cloudera?
24© Cloudera, Inc. All rights reserved.
Lessons Learned
• First used non-Cloudera consulting: Big mistake – design incorrect for data collected. Work with
Cloudera Professional Services to design it right the first time.
• Start small, if you can, and grow solution.
• Don’t need big capital investment upfront
• Get value out of small cluster (eg. 3 nodes) and expand as needed.
• Install services to meet your current needs. Install additional services as your data needs change.
• Look at all Cloudera solutions, learn them, and use them.
• Training: Be generous, conduct in phases to keep new skills relevant as you build and deploy.
• What’s next?
• Prebuilt analytical models as a platform.
• Evaluate Navigator Optimizer to improve query performance and identify best candidates for legacy
application migration
25© Cloudera, Inc. All rights reserved.
What’s Next for Cloudera’s Analytic
Database?
26© Cloudera, Inc. All rights reserved.
Analytic Database Roadmap
Faster, richer, more expressive
SQL
• Hive-on-Spark GA
• Insert, update delete via Kudu
• Performance improvements
• Nested JSON
Improved multitenancy
• Fewer OOM errors
• Graceful node decomission
• Admission control enhancements
• Improved YARN integration
Better SQL workbench
• Higher Hue concurrency
• SQL editor usability improvements
• Intelligent recommendations of tables,
joins & more for Hue users
• Exposing tags & lineage through the
Hue query experience
Deeper integration with BI
tools
• Joint workload optimizations
• Support for nested types and s
• Data discovery functionality injected
into the BI experience
Workload optimization
• Multi-platform workload profiling
• Recommendation of in-line
materialized views
Confidential – Do not Redistribute
27© Cloudera, Inc. All rights reserved.
Next Steps
• Download Cloudera 5.8
• cloudera.com/downloads
• Release Notes
• cloudera.com/documentation/enterprise/release-
notes/topics/rg_release_notes.html
• Learn more about Navigator Optimizer and BI in the Cloud
• Register for Parts 2 & 3 of the Webinar Series!
• cloudera.com/about-cloudera/events/webinars/5-8-webinar-series.html
28© Cloudera, Inc. All rights reserved.
Questions?

More Related Content

What's hot (20)

PPTX
Driving Better Products with Customer Intelligence

Cloudera, Inc.
 
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
PDF
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PPTX
Intuitive Real-Time Analytics with Search
Cloudera, Inc.
 
PPTX
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
PPTX
RecordService for Unified Access Control
Cloudera, Inc.
 
PPTX
Relying on Data for Strategic Decision-Making--Financial Services Experience
Cloudera, Inc.
 
PPTX
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
PPTX
End to End Streaming Architectures
Cloudera, Inc.
 
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
PPTX
Put Alternative Data to Use in Capital Markets

Cloudera, Inc.
 
PPTX
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Cloudera, Inc.
 
PPTX
Advanced Analytics for Investment Firms and Machine Learning
Cloudera, Inc.
 
PPTX
Secure Data - Why Encryption and Access Control are Game Changers
Cloudera, Inc.
 
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
PPTX
The Big Picture: Learned Behaviors in Churn
Cloudera, Inc.
 
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
PPTX
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 
PPTX
Engaging with Cloudera & Morning Wrap Up
Cloudera, Inc.
 
Driving Better Products with Customer Intelligence

Cloudera, Inc.
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Intuitive Real-Time Analytics with Search
Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
RecordService for Unified Access Control
Cloudera, Inc.
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Cloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
End to End Streaming Architectures
Cloudera, Inc.
 
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera, Inc.
 
Put Alternative Data to Use in Capital Markets

Cloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Cloudera, Inc.
 
Advanced Analytics for Investment Firms and Machine Learning
Cloudera, Inc.
 
Secure Data - Why Encryption and Access Control are Game Changers
Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
Cloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 
Engaging with Cloudera & Morning Wrap Up
Cloudera, Inc.
 

Similar to Building a Modern Analytic Database with Cloudera 5.8 (20)

PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PDF
CSC - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
PDF
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
PPTX
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 
PPTX
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Cloudera, Inc.
 
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
PDF
Assessing New Database Capabilities – Multi-Model
DATAVERSITY
 
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

Cloudera, Inc.
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
PPTX
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 
PDF
Big Data
Charter Global
 
PPTX
IBM Relay 2015: Open for Data
IBM
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PDF
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
PPTX
151116 Sedania Cloudera BDA Profile
Zarul Zaabah
 
PPTX
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Cloudera, Inc.
 
PPTX
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PDF
Come fare business con i big data in concreto
HP Enterprise Italia
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
CSC - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Cloudera, Inc.
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
Assessing New Database Capabilities – Multi-Model
DATAVERSITY
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Cloudera, Inc.
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 
Big Data
Charter Global
 
IBM Relay 2015: Open for Data
IBM
 
Data Warehouse Optimization
Cloudera, Inc.
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
151116 Sedania Cloudera BDA Profile
Zarul Zaabah
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Come fare business con i big data in concreto
HP Enterprise Italia
 
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
PPTX
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PPTX
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Ad

Recently uploaded (20)

PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Executive Business Intelligence Dashboards
vandeslie24
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 

Building a Modern Analytic Database with Cloudera 5.8

  • 1. 1© Cloudera, Inc. All rights reserved. Building a Modern Analytic Database with Cloudera 5.8 Justin Erickson | Sr Director of Product | Cloudera Andy Frey | CIO | Marketing Associates
  • 2. 2© Cloudera, Inc. All rights reserved. Agenda • Building a Modern Analytic Database with Hadoop • Key Use Cases Enabled • What’s New with Cloudera 5.8 • Marketing Associates Customer Case Study • What’s Next?
  • 3. 3© Cloudera, Inc. All rights reserved. Common Application Patterns Operational Efficiency New Business Value OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE Process data, develop & serve predictive models Data Engineering & Science ELT, reporting, exploratory business intelligence Analytic Database Build data-driven applications to deliver real-time insights. Operational Database
  • 4. 4© Cloudera, Inc. All rights reserved. Analytic Database More data of all types is being tapped for analytics, across environments Self-Service BI & Data Open up new possibilities for real-time insights as data changes Real-Time Analysis BI & analytics are critical but only tell part of the story. Get more value by sharing data across workloads Converged Workloads
  • 5. 5© Cloudera, Inc. All rights reserved. Key Use Cases EDW Optimization Data Preparation Self-Service BI & Exploration Use your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over large data volumes, so data is always ready for your business Fastest time-to-insights with a modern analytic database designed with Hadoop’s flexibility and agility
  • 6. 6© Cloudera, Inc. All rights reserved. Cloudera’s Analytic Database Solution OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE Identify, offload, & optimize workloads to Hadoop Navigator Optimizer Intelligent SQL editor Hue Audit, lineage, encryption, key management, & policy lifecycles Navigator Integration with the leading BI tools BI Partners Interactive query engine for BI & SQL analytics Impala Large-scale ETL & batch processing engine Hive-on- Spark
  • 7. 7© Cloudera, Inc. All rights reserved. ETL & Data Preparation • Flexible & Scalable • Process larger data volumes, of any type • Fastest Data Processing • Distributed processing and best-of-breed technologies for the fastest performance • Minimize Data Movement • Prepared data immediately available for analytics with shared storage and metadata
  • 8. 8© Cloudera, Inc. All rights reserved. Self-Service BI & Exploratory Analytics • Self-Service Data Agility • No rigid data modeling encumbrances for agile acquisition • Iteratively analyze and flexibly model • Self-Service Exploratory Analytics • Interactive responses for iterative exploration • Confidently handle all BI and SQL users • Cost-Effective Scalability with Users/Data • Easily add nodes to handle more data and users • Leverage the full potential of available data • Productively Use Existing Tools and Skills • Integration with all leading BI tools & compatible analytic SQL language • Metadata and lineage for easy data discovery • Intelligent SQL editor for greater developer productivity
  • 9. 9© Cloudera, Inc. All rights reserved. Optimize the Enterprise Data Warehouse • Decrease Storage Costs • Focus on high-value reporting data in the EDW • Keep More/All Data Online • Unlimited scale keeps data accessible and out of archive • Improve Performance • Eliminate contention and meet SLAs for routine reporting • Get New Insights • Enable ad hoc and exploratory analytics Siemens’ TCO Assessment (cost/TB)
  • 10. 10© Cloudera, Inc. All rights reserved. What’s New in Cloudera 5.8
  • 11. 11© Cloudera, Inc. All rights reserved. Advancements with Cloudera 5.8 Impala Hue Navigator Optimizer • Cloud-Native: • Read/write directly from Amazon S3 • Performance: • >10x faster performance on secure clusters • Data Discovery: • Preview, tag, search, pin tables in browser • Query Design Assistance: • Autocomplete of tables, columns, syntax • Efficient troubleshooting • Collaboration & Sharing: • Save & share queries with peers • Set permissions directly on results • Now GA! • Ease offloading path to Hadoop • Active Data Optimization to enable peak performance for Hive and Impala
  • 12. 12© Cloudera, Inc. All rights reserved. Self-Service Data Discovery & BI at Marketing Associates Andy Frey
  • 13. 13© Cloudera, Inc. All rights reserved. About Me – Andy Frey From Assembler to Ajax, Modem to Mobile, and Mainframe to Cloud, Andy Frey, developed his deep knowledge as a technologist, and CIO at leading national corporations such as GAB Robins, Compuware, J. Walter Thompson, Coolfire and now Marketing Associates, providing Fortune 100 corporations with technologically advanced enterprise solutions.
  • 14. 14© Cloudera, Inc. All rights reserved. Introducing Magnify and Marketing Associates • Magnify Analytic Solutions — a wholly-owned division of Detroit, Michigan-based Marketing Associates serving primarily Fortune 100 clients — uses technology-driven data analysis to offer clients a range of informed business services that increase profitability through its four lines of service: business intelligence, digital intelligence, credit risk management, and marketing analytics. • Established in 1967 Marketing Associates is a full-service, technology enabled marketing services company headquartered in Detroit, Michigan with offices in Wilmington, Delaware and Charlotte, North Carolina. MA offers private and public cloud hosting, custom web development, and data transformation among its’ IT based services. Offering Cloudera Hadoop IaaS and experienced Data Scientists
  • 15. 15© Cloudera, Inc. All rights reserved. Different Challenges for Different Clients The B2C Challenge • Previously using expensive RDBMS systems to deliver B2C marketing contests and product giveaways. Up to 150 in a year. • Huge spikes in web event data posed challenges. 200,000 hits in first minute for popular brands’ campaigns. • Cost to license for biggest spike made projects unprofitable. • Also needed to monitor and manipulate massive amounts of data in real time. RDBMS could not respond adequately during massive data intake during campaign run. “When has a campaign reached its limit? Has total supply of product been allocated?”
  • 16. 16© Cloudera, Inc. All rights reserved. Different Challenges for Different Clients The CRM Challenge • Another project for a large client involved managing a repository of customer data from multiple sources. The magnitude was vast, data was multi-structured and new sources were being added on a regular basis. • Initially executed using 4 relational databases, query times slowed and costs soared. • Difficulty merging unstructured data from multiple sources using traditional RDBMS. • Deployment of prominent SQL RDBMS estimated @ $5 million cost (approx. 150 terabyte).
  • 17. 17© Cloudera, Inc. All rights reserved. Evaluation & Decision Key criteria for modern analytic database: • Handle huge spikes in web event data. • Manage and manipulate massive data volumes in real-time. • Scalability and performance. • Ability to skill transfer from current SQL based programming team. • Reduce costs.
  • 18. 18© Cloudera, Inc. All rights reserved. Evaluation & Decision Key criteria for modern analytic database: • Handle huge spikes in web event data. • Manage and manipulate massive data volumes in real-time. • Scalability and performance. • Ability to skill transfer from current SQL based programming team. • Reduce costs. Considered various offerings: • Considered SQL Server (discarded due to cost). • Knew Hadoop could be the solution and started looking at commercial implementations. • Considered non-commercial & shorted listed two Hadoop vendors: Cloudera & Hortonworks. • Determined non-commercial too risky, too burdensome – left it to the experts.
  • 19. 19© Cloudera, Inc. All rights reserved. Evaluation & Decision Key criteria for modern analytic database: • Handle huge spikes in web event data. • Manage and manipulate massive data volumes in real-time. • Scalability and performance. • Ability to skill transfer from current SQL based programming team. • Reduce costs. Considered various offerings: • Considered SQL Server (discarded due to cost). • Knew Hadoop could be the solution and started looking at commercial implementations. • Considered non-commercial & shorted listed two Hadoop vendors: Cloudera & Hortonworks. • Determined non-commercial too risky, too burdensome – left it to the experts. • Launched June 2014 • Why Hadoop: Cost, Tech Requirements, Data Size • Why Cloudera: Most mature solution with better overall enterprise toolset; Cloudera Team Decision
  • 20. 20© Cloudera, Inc. All rights reserved. Solution • Hadoop Platform: • Cloudera Enterprise • Hadoop Components: • Apache Flume, Apache Sqoop, Apache Hive, MapReduce, Apache Impala (incubating), Hue, Cloudera Manager • Third-Party BI & Analytic Tools: • D3.js, SAS, Tableau, R, Angoss • Security Tools: • Kerberos, Apache Sentry, Cloudera Navigator
  • 21. 21© Cloudera, Inc. All rights reserved. Solution: Self-Service Data Discovery & BI • Self-service data discovery capabilities allow us to eliminate the need for distribution of multiple Excel reports instead allowing our clients to interact directly with Hadoop. • Security enhanced as the need for distribution of Excel reports via email went away. • Use of Tableau to run Impala queries produces real-time reporting resulting in significant value add and convenience for our clients. • Offers scalability and flexibility to accommodate diverse and growing client demands. • Allows us to scale our web event product giveaways. • Accommodates the addition of new data sources. • Easily add nodes to avoid potential performance bottlenecks.
  • 22. 22© Cloudera, Inc. All rights reserved. Why We Chose Cloudera • Cloudera Manager became a major differentiator. • Made cluster management easy • User friendly = reduced learning curve • Chose Impala for its real-time query performance. • Proven Cloudera innovation and zeal to maintain an enterprise class solution by offering new tools and functions while maintaining/supporting the Apache project. • Cloudera appeared to be the prominent choice of large Hadoop installs in the Fortune 500. Best IT Analyst rating. • Impressed with Cloudera team before purchase.
  • 23. 23© Cloudera, Inc. All rights reserved. Benefits & Impact • All-inclusive Cloudera Enterprise costs less than the required relational database licenses alone - Over 90% cost reduction. • Other benefits - cheaper hardware and easier to manage. • Cloudera Navigator provides a single interface to locate and classify data, audit who is accessing what data, and protect the data with centralized key management. • Critical tool when handling PII and other sensitive data. Comprehensive audit trail allows for easy monitoring of PII data access. • Allows us to satisfy strict security compliance regulations with ease. • Cloudera Professional Services are knowledgeable, responsive, and help establish best practices for our internal development team. They helped us get it right the second time. Any time we had a crisis they were there to help Why we are glad we chose Cloudera?
  • 24. 24© Cloudera, Inc. All rights reserved. Lessons Learned • First used non-Cloudera consulting: Big mistake – design incorrect for data collected. Work with Cloudera Professional Services to design it right the first time. • Start small, if you can, and grow solution. • Don’t need big capital investment upfront • Get value out of small cluster (eg. 3 nodes) and expand as needed. • Install services to meet your current needs. Install additional services as your data needs change. • Look at all Cloudera solutions, learn them, and use them. • Training: Be generous, conduct in phases to keep new skills relevant as you build and deploy. • What’s next? • Prebuilt analytical models as a platform. • Evaluate Navigator Optimizer to improve query performance and identify best candidates for legacy application migration
  • 25. 25© Cloudera, Inc. All rights reserved. What’s Next for Cloudera’s Analytic Database?
  • 26. 26© Cloudera, Inc. All rights reserved. Analytic Database Roadmap Faster, richer, more expressive SQL • Hive-on-Spark GA • Insert, update delete via Kudu • Performance improvements • Nested JSON Improved multitenancy • Fewer OOM errors • Graceful node decomission • Admission control enhancements • Improved YARN integration Better SQL workbench • Higher Hue concurrency • SQL editor usability improvements • Intelligent recommendations of tables, joins & more for Hue users • Exposing tags & lineage through the Hue query experience Deeper integration with BI tools • Joint workload optimizations • Support for nested types and s • Data discovery functionality injected into the BI experience Workload optimization • Multi-platform workload profiling • Recommendation of in-line materialized views Confidential – Do not Redistribute
  • 27. 27© Cloudera, Inc. All rights reserved. Next Steps • Download Cloudera 5.8 • cloudera.com/downloads • Release Notes • cloudera.com/documentation/enterprise/release- notes/topics/rg_release_notes.html • Learn more about Navigator Optimizer and BI in the Cloud • Register for Parts 2 & 3 of the Webinar Series! • cloudera.com/about-cloudera/events/webinars/5-8-webinar-series.html
  • 28. 28© Cloudera, Inc. All rights reserved. Questions?