Customer Success Story
Cloudera & Xpand IT
Nuno Barreto
Associate Partner & Big Data Lead
nuno.barreto@xpand-it.com
Proprietary & Confidential www.xpand-it.com
THE PROBLEM
How is process Y
progressing?
Who are the main cluster
users/departments?
Which engines does
each department use?
Do I need to plan
on an upgrade?
How much is process
X costing me?
Are there available
time slots?
THE SOLUTION
TELEMETRY
ETL FLOW CONTROL
DATA
PREPARATION
ARCHITECURE
CORE AGENT(s)
QUEUE
REAL-TIME
ONLINEDB
ANALYTICSREPO
ETL
start/stop
jobs
start/stop
jobs
PDI
extensionlogflow control
ANALYTICS
ANALYTICSDB
status check
metadata access
data
access
analytics data
analytical
queries
operational data
THIS INVOLES A NUMBER OF CONCEPTS
NEAR REAL-TIME
CLOUDERA INTEGRATION
LAMBDA ARCHITECTURE
STREAMING
NEAR REAL-TIME
AND
STREAMING
REAL-TIME & STREAMING
CORE AGENT(s)
QUEUE
REAL-TIME
ONLINEDB
ANALYTICSREPO
ETL
start/stop
jobs
start/stop
jobs
PDI
extensionlogflow control
ANALYTICS
ANALYTICSDB
status check
metadata access
data
access
analytics data
analytical
queries
operational data
REMOTE AGENTS
FINE GRAINED CONTROL
ETL TOOL SPECIFIC
REAL TIME LOGGING
ASYNC EXECUTION
PDI EXTENSION POINTS
CAPTURE LOG START/END
CAPTURE CONNECTION TYPE
CAPTURE STEP LINEAGE DETAIL
GATHERING EXECUTION DATA
USE KAFKA AS A LOG SINK
FAULT TOLERANT
REAL TIME
CONSISTENT
COLLECT LOG DATA IN (AS) REALTIME (AS
POSSIBLE)
SPARK AS KAFKA COLLECTOR
REAL TIME LOG PARSING
ETL TOOL ADAPTABLE
DATA DUMPS IN IMPALA AND
HBASE
GENERATES NOTIFICATIONS
LAMBDA
ARCHITECTURE
LAMBDA ARCHITECTURE
CORE AGENT(s)
QUEUE
REAL-TIME
ONLINEDB
ANALYTICSREPO
ETL
start/stop
jobs
start/stop
jobs
PDI
extensionlogflow control
ANALYTICS
ANALYTICSDB
status check
metadata access
data
access
analytics data
analytical
queries
operational data
DISCLAIMER
What you are about to see is a
Work In Progress so, be gentle in
case…
• the demo doesn’t work
• features don’t work as
described
• connection goes down
DEMO
REAL-TIME AND STREAMING
CLOUDERA
INTEGRATION
HOW TO MANAGE ALL THESE COMPONENTS
LOTS OF MOVING PARTS
OPERATIONS
LOADS OF CONFIG FILES
THE ANSWER
EXTENSIBLE ARCHITECTURE
SEAMLESS INTEGRATION
MONITORING
CONFIGURATION MANAGEMENT
DEPENDENCIES MANAGEMENT
LOG CHECK
SETUP AND ADMIN
DEMO
CLOUDERA INTEGRATION
SUMMARY
NOT EVERYTING WE DO IS THIS
COMPLEX
HADOOP STACK CHOICE MATTERS
RE-USABLE DESIGN PATTERNS
QUESTIONS?

More Related Content

PPTX
Azure Cost Management
PDF
Modernizing to a Cloud Data Architecture
PDF
Aks pimarox from zero to hero
PDF
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
PDF
Getting Started with Delta Lake on Databricks
PDF
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
PPTX
Google Cloud and Data Pipeline Patterns
PDF
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Azure Cost Management
Modernizing to a Cloud Data Architecture
Aks pimarox from zero to hero
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Getting Started with Delta Lake on Databricks
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Google Cloud and Data Pipeline Patterns
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...

What's hot (20)

PPTX
Reshape Data Lake (as of 2020.07)
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Productizing Structured Streaming Jobs
PDF
Amazon Aurora Deep Dive (김기완) - AWS DB Day
PDF
Building large scale transactional data lake using apache hudi
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
PPTX
Data Engineering with Databricks Presentation
PDF
Scaling containers with keda
PDF
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
PDF
Chunking, Embeddings, and Vector Databases
PPTX
Apache Spark Architecture
PDF
ksqlDB - Stream Processing simplified!
PDF
What’s New with Databricks Machine Learning
PPTX
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Challenge And Evolution Of Data Orchestration at Rakuten Data System
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Reshape Data Lake (as of 2020.07)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Productizing Structured Streaming Jobs
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Building large scale transactional data lake using apache hudi
Architect’s Open-Source Guide for a Data Mesh Architecture
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
Data Engineering with Databricks Presentation
Scaling containers with keda
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Chunking, Embeddings, and Vector Databases
Apache Spark Architecture
ksqlDB - Stream Processing simplified!
What’s New with Databricks Machine Learning
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Scaling your Data Pipelines with Apache Spark on Kubernetes
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
The Parquet Format and Performance Optimization Opportunities
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Ad

Viewers also liked (9)

PDF
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
PPTX
Put Alternative Data to Use in Capital Markets

PPTX
Transform Banking with Big Data and Automated Machine Learning 9.12.17
PPTX
The Big Picture: Real-time Data is Defining Intelligent Offers
PPTX
Large-Scale Data Science on Hadoop (Intel Big Data Day)
PPTX
IoT - Data Management Trends, Best Practices, & Use Cases
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
PPTX
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
PPTX
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Put Alternative Data to Use in Capital Markets

Transform Banking with Big Data and Automated Machine Learning 9.12.17
The Big Picture: Real-time Data is Defining Intelligent Offers
Large-Scale Data Science on Hadoop (Intel Big Data Day)
IoT - Data Management Trends, Best Practices, & Use Cases
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Ad

Similar to Cloudera Customer Success Story (20)

PPTX
Breakout: Hadoop and the Operational Data Store
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
Strata EU tutorial - Architectural considerations for hadoop applications
PPTX
The 5 Biggest Data Myths in Telco: Exposed
PDF
IDEAS Global A.I. Conference 2022.pdf
PPTX
Driving Better Products with Customer Intelligence

PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Data Stack Summit 2023
PPTX
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PPTX
Data Warehouse Optimization
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PPTX
Introducing Workload XM 8.7.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PDF
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
PPTX
POLESTAR XEUS, Hypervisor-agnostic Cloud Management Platform from NKIA
PPTX
Big Data/Cloudera from Excelerate Systems
PPTX
Turning Data into Business Value with a Modern Data Platform
Breakout: Hadoop and the Operational Data Store
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Strata EU tutorial - Architectural considerations for hadoop applications
The 5 Biggest Data Myths in Telco: Exposed
IDEAS Global A.I. Conference 2022.pdf
Driving Better Products with Customer Intelligence

Modern Data Warehouse Fundamentals Part 1
Edc event vienna presentation 1 oct 2019
Data Stack Summit 2023
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Data Warehouse Optimization
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Introducing Workload XM 8.7.18
Modern Data Warehouse Fundamentals Part 3
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
POLESTAR XEUS, Hypervisor-agnostic Cloud Management Platform from NKIA
Big Data/Cloudera from Excelerate Systems
Turning Data into Business Value with a Modern Data Platform

More from Xpand IT (20)

PDF
Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018
PDF
Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017
PPTX
Xporter for Jira - Overview
PPTX
Xray for Jira - How to automate your QA process
PPTX
Xpand Addons - Addon Discovery Day 2017
PPTX
Xray for Jira 3.0 - What's New?
PPTX
Xray for Jira - Overview
PPTX
Xporter for Jira - Advanced topics
PDF
Keynote - Xamarin Experience London 2017
PPTX
Welcome & Introduction – Xamarin Experience London 2017
PDF
Gathering Customer Insights with Sitecore - Xamarin Experience London 2017
PPTX
Why Speed Matters in Mobile Apps – Xamarin Experience London 2017
PDF
Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...
PDF
Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...
PDF
The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017
PPTX
Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017
PDF
Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017
PPTX
How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...
PPTX
Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017
PDF
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018
Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017
Xporter for Jira - Overview
Xray for Jira - How to automate your QA process
Xpand Addons - Addon Discovery Day 2017
Xray for Jira 3.0 - What's New?
Xray for Jira - Overview
Xporter for Jira - Advanced topics
Keynote - Xamarin Experience London 2017
Welcome & Introduction – Xamarin Experience London 2017
Gathering Customer Insights with Sitecore - Xamarin Experience London 2017
Why Speed Matters in Mobile Apps – Xamarin Experience London 2017
Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...
Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...
The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017
Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017
Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017
How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...
Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017

Recently uploaded (20)

PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Modernising the Digital Integration Hub
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
STKI Israel Market Study 2025 version august
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPT
Geologic Time for studying geology for geologist
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Internet of Everything -Basic concepts details
PDF
Architecture types and enterprise applications.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
Microsoft Excel 365/2024 Beginner's training
TEXTILE technology diploma scope and career opportunities
UiPath Agentic Automation session 1: RPA to Agents
Custom Battery Pack Design Considerations for Performance and Safety
Modernising the Digital Integration Hub
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
STKI Israel Market Study 2025 version august
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Consumable AI The What, Why & How for Small Teams.pdf
OpenACC and Open Hackathons Monthly Highlights July 2025
A proposed approach for plagiarism detection in Myanmar Unicode text
A review of recent deep learning applications in wood surface defect identifi...
Developing a website for English-speaking practice to English as a foreign la...
Geologic Time for studying geology for geologist
Flame analysis and combustion estimation using large language and vision assi...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Internet of Everything -Basic concepts details
Architecture types and enterprise applications.pdf
Final SEM Unit 1 for mit wpu at pune .pptx

Cloudera Customer Success Story