SlideShare a Scribd company logo
1
Shankar Radhakrishnan
Impetus
Hybrid Data Platform
Cloud Environment Connected with
On-Premise Data Environment
2
About Me
• Director of Big Data Engineering with Impetus
• Focus on Enterprise data architecture, Data platform solution
deployment, High Performance & Optimization
• Believer of “Data is the most important digital asset”
4
Need For Hybrid Data Platform
• Mixed work-load scenarios on Hadoop
• Applications’ long-tail usage of data platforms
• Time-spent on data preparation than processing
• Time-spent on data movement
• Geo-centric data processing and provisioning requirements
• Cost effective solution options
• Untapped scale up and scale out capabilities of Cloud
• Limitations with a physical data center/platform setup
5
Hybrid Data Platform
“Combination of on-premise physical data infrastructure with Cloud
based Big Data platform - to use as one extended, complementary,
scalable data infrastructure”
6
Considerations
• Changes to current architecture
– Impact on on-premise infrastructure
– Impact on business processes
– Data availability and accessibility in the Cloud
• Impact on data exchange policy and procedures
– Data Characteristics – Data at rest & in-motion
– Geographical considerations
• Data Security
• Virtual Cloud Geo-Fencing, Cloud Boundaries
• Investment considerations
– Technology Choices, Maturity and Adoption
7
Hybrid Data Platform Architecture
Databases
Other
Data
Sources
Sensitive
Data
Text Files,
Binary Files
SmartInterfaceLayer
Security&AccessControl
Hadoop
On Cloud
On-Premise
Hadoop
Landing Zone
On-Premise
Hadoop
Data Lake
Security&AccessControl
ApplicationInterfaces
Integration
Check-point
On-Prem/Cloud
3rd
Parties
Analytics
Data Scientists
Business
Data Acquisition
Layer
Data Integration
Layer
Data Provisioning
Layer
User Management
Access Audit and Control
Metadata Management
Data Security Management
BAR Management
DR Management
Workload Management
Key Management Master Data Management Data Quality Management Operations Management
Data Governance Layer
8
Data Integration
Hadoop
On CloudJob/Task
Profiler
On-Premise
Hadoop
Data Lake
Integration
Check-point
On-Prem/Cloud
Data Upload
Workflow
Organizer
Payload
Organizer
User Profile
Network
Profile
Data Profile
Private, Secured
Tunnel
Private, Secured
Tunnel
Transmission
Channel
Security Checks
9
Execution Workflow
S3
(Data Landing)
Payload
Organizer
Private, Secured
Tunnel
Transmission
Channel
Security Checks
Payload
Delivery
Cloud HSM
Identity &
Access
Management
Key Management
Service
Certificate
Manager
QuickSight
SNS
( Push Notification )
On-Premise
Hadoop
Data Lake
Private, Secured
Tunnel
Data Pipeline
SQS
( Queue Service )
RedShift
Data warehouse
Kinesis
EMR/MapReduce
10
Data Exchange & Security
Cloud HSM
Identity &
Access
Management
Key Management
Service
Certificate
Manager
1
2
3
4
Data Center
Direct Connect
Secure Tunnel
VPC
On premise Data Center hosts Hadoop Cluster and has
connectivity established to the Cloud
1
Uses Direct Connect option to connect to the private
Cloud setup
2
Uses secured VPN tunnel to the dedicated Cloud setup
for data exchange3
Hadoop on Cloud setup connected with data center,
secured behind firewall and access restrictions
4
Role based access control, process execution privileges,
Identity management
5
5
11
Benefits
• Comprehensive Solution Options
– Modular and complementary data management options
• Flexibility
– Meets dynamic business and technology demands
• Performance and Scalability
– Scale up and out
• Best of both worlds
– Play to platform’s strengths
• Economic$
– Hybrid model provides best of TCO and ROI
12
Case Study
• One of the worlds
largest producer of
commodities, natural
ores, conventional and
unconventional energy
resources, with
suppliers and
consumers as end users
of data analytics
• Need to build an Hybrid
Data Analytics
Environment covering
areas such as
Productivity, Supply
Chain and Operations
• Data to be loaded in
less than 20 minutes
• Analytics queries to run
in less than 5-seconds
on 95% of the queries
• Highly available
environment with both
on-premise and Cloud
connectivity
13
Thank You !
@shankariyer www.linkedin.com/in/2shankar

More Related Content

What's hot (20)

PDF
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
PDF
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
PPTX
Microsoft Fabric.pptx
Shruti Chaurasia
 
PDF
Microsoft 365 Security Overview
Robert Crane
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
PPTX
Snowflake Overview
Snowflake Computing
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PPTX
Breakdown of Microsoft Purview Solutions
Drew Madelung
 
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
PDF
Data Marketplace and the Role of Data Virtualization
Denodo
 
PDF
Data Management Best Practices
DATAVERSITY
 
PPTX
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Lviv Startup Club
 
PPTX
Azure data platform overview
James Serra
 
PDF
Azure Information Protection
Robert Crane
 
PPTX
Azure Fundamentals || AZ-900
thisiswali
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
PPTX
Migrating on premises workload to azure sql database
PARIKSHIT SAVJANI
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
Microsoft Fabric.pptx
Shruti Chaurasia
 
Microsoft 365 Security Overview
Robert Crane
 
Data Lakehouse Symposium | Day 4
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Snowflake Overview
Snowflake Computing
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Breakdown of Microsoft Purview Solutions
Drew Madelung
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Data Marketplace and the Role of Data Virtualization
Denodo
 
Data Management Best Practices
DATAVERSITY
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Lviv Startup Club
 
Azure data platform overview
James Serra
 
Azure Information Protection
Robert Crane
 
Azure Fundamentals || AZ-900
thisiswali
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
Migrating on premises workload to azure sql database
PARIKSHIT SAVJANI
 

Similar to Hybrid Data Platform (20)

PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 
PDF
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...
Denodo
 
PPTX
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
PPTX
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
PDF
A Successful Journey to the Cloud with Data Virtualization
Denodo
 
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
Govern and Protect Your End User Information
Denodo
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
PDF
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
PDF
Multi-Cloud Integration with Data Virtualization (ASEAN)
Denodo
 
PPTX
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
PPTX
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA
 
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
PDF
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
Alluxio, Inc.
 
PPTX
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
PPTX
OpenSource and the Cloud ApacheCon.pptx
lohitvijayarenu
 
PDF
大数据数据治理及数据安全
Jianwei Li
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...
Denodo
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
A Successful Journey to the Cloud with Data Virtualization
Denodo
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Govern and Protect Your End User Information
Denodo
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
Multi-Cloud Integration with Data Virtualization (ASEAN)
Denodo
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
Alluxio, Inc.
 
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
OpenSource and the Cloud ApacheCon.pptx
lohitvijayarenu
 
大数据数据治理及数据安全
Jianwei Li
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
The Future of Artificial Intelligence (AI)
Mukul
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 

Hybrid Data Platform

  • 1. 1 Shankar Radhakrishnan Impetus Hybrid Data Platform Cloud Environment Connected with On-Premise Data Environment
  • 2. 2 About Me • Director of Big Data Engineering with Impetus • Focus on Enterprise data architecture, Data platform solution deployment, High Performance & Optimization • Believer of “Data is the most important digital asset”
  • 3. 4 Need For Hybrid Data Platform • Mixed work-load scenarios on Hadoop • Applications’ long-tail usage of data platforms • Time-spent on data preparation than processing • Time-spent on data movement • Geo-centric data processing and provisioning requirements • Cost effective solution options • Untapped scale up and scale out capabilities of Cloud • Limitations with a physical data center/platform setup
  • 4. 5 Hybrid Data Platform “Combination of on-premise physical data infrastructure with Cloud based Big Data platform - to use as one extended, complementary, scalable data infrastructure”
  • 5. 6 Considerations • Changes to current architecture – Impact on on-premise infrastructure – Impact on business processes – Data availability and accessibility in the Cloud • Impact on data exchange policy and procedures – Data Characteristics – Data at rest & in-motion – Geographical considerations • Data Security • Virtual Cloud Geo-Fencing, Cloud Boundaries • Investment considerations – Technology Choices, Maturity and Adoption
  • 6. 7 Hybrid Data Platform Architecture Databases Other Data Sources Sensitive Data Text Files, Binary Files SmartInterfaceLayer Security&AccessControl Hadoop On Cloud On-Premise Hadoop Landing Zone On-Premise Hadoop Data Lake Security&AccessControl ApplicationInterfaces Integration Check-point On-Prem/Cloud 3rd Parties Analytics Data Scientists Business Data Acquisition Layer Data Integration Layer Data Provisioning Layer User Management Access Audit and Control Metadata Management Data Security Management BAR Management DR Management Workload Management Key Management Master Data Management Data Quality Management Operations Management Data Governance Layer
  • 7. 8 Data Integration Hadoop On CloudJob/Task Profiler On-Premise Hadoop Data Lake Integration Check-point On-Prem/Cloud Data Upload Workflow Organizer Payload Organizer User Profile Network Profile Data Profile Private, Secured Tunnel Private, Secured Tunnel Transmission Channel Security Checks
  • 8. 9 Execution Workflow S3 (Data Landing) Payload Organizer Private, Secured Tunnel Transmission Channel Security Checks Payload Delivery Cloud HSM Identity & Access Management Key Management Service Certificate Manager QuickSight SNS ( Push Notification ) On-Premise Hadoop Data Lake Private, Secured Tunnel Data Pipeline SQS ( Queue Service ) RedShift Data warehouse Kinesis EMR/MapReduce
  • 9. 10 Data Exchange & Security Cloud HSM Identity & Access Management Key Management Service Certificate Manager 1 2 3 4 Data Center Direct Connect Secure Tunnel VPC On premise Data Center hosts Hadoop Cluster and has connectivity established to the Cloud 1 Uses Direct Connect option to connect to the private Cloud setup 2 Uses secured VPN tunnel to the dedicated Cloud setup for data exchange3 Hadoop on Cloud setup connected with data center, secured behind firewall and access restrictions 4 Role based access control, process execution privileges, Identity management 5 5
  • 10. 11 Benefits • Comprehensive Solution Options – Modular and complementary data management options • Flexibility – Meets dynamic business and technology demands • Performance and Scalability – Scale up and out • Best of both worlds – Play to platform’s strengths • Economic$ – Hybrid model provides best of TCO and ROI
  • 11. 12 Case Study • One of the worlds largest producer of commodities, natural ores, conventional and unconventional energy resources, with suppliers and consumers as end users of data analytics • Need to build an Hybrid Data Analytics Environment covering areas such as Productivity, Supply Chain and Operations • Data to be loaded in less than 20 minutes • Analytics queries to run in less than 5-seconds on 95% of the queries • Highly available environment with both on-premise and Cloud connectivity
  • 12. 13 Thank You ! @shankariyer www.linkedin.com/in/2shankar