SlideShare a Scribd company logo
www.clairvoyantsoft.com
Migrating Big Data Workloads
to the Cloud
By: Robert Sanders
| 2
Robert Sanders
Director of Big Data Engineering
https://www.linkedin.com/in/robert-sanders-cs/
Robert Sanders is the Director of Big Data Engineering at
Clairvoyant Insight. In his day job, Robert wears multiple
hats and goes between leading members of the Insight
development team to working directly with clients to
Architecting and Engineering large scale Data projects.
Robert has a deep background in enterprise systems,
initially working on full-stack implementations and then
focusing on building Data Management Platforms.
| 3
About Clairvoyant
Background Awards & Recognition
Boutique consulting firm centered on building data solutions and
products
All things Web and Data Engineering, Analytics, ML and User
Experience to bring it all together
Support core Hadoop platform, data engineering pipelines and provide
administrative and devops expertise focused on Hadoop
| 4
● “The Cloud” and “The Why”
● Cloud Migration Strategies
● Data Migration
● Application Migration
Agenda
| 5
Rented Infrastructure and Services
The Cloud
| 6
Reasons to Move to the Cloud
● Ability to make use of Cloud Services
● Reduce Cost
● Reduce Time to Spin Up new Instances
● Scalability
| 7
● Repurchase (Drop and Shop)
● Rehost (Lift and Shift)
● Replatform (Lift, Tinker and Shift)
● Refactor/Re-Architect
● Retiring
● Retaining
General Migration Strategies - 6 R’s
| 8
1. Forklift
2. Hybrid
3. Cloud Native
Big Data Migration Strategies
| 9
● Rehost Option
Forklift
| 10
● Rehost, Replatform and Refactor/Re-Architect Option
Hybrid
| 11
Options with Blob Storage
| 12
● Replatform and Refactor/Re-Architect Option
Cloud Native
| 13
Forklift Hybrid Cloud Native
| 14
$ hadoop distcp hdfs://src_host/path/* hdfs://dest_host/path/
$ hadoop distcp hdfs://src_host/path/* s3a://dest_bucket/
Data Migration Strategies - DistCP
| 15
File Transfer Rate
Note: Estimates (assumes - 25% network overhead)
100 Mbps 1 Gbps 10 Gbps
1 TB 30 hrs 3 hrs 18 min
10 TB 12 days 30 hrs 3 hrs
100 TB 124 days 12 days 30 hrs
1 PB 3 years 124 days 12 days
10 PB 34 years 3 years 124 days
Usable Network Bandwidth
Data to Transfer
| 16
● Azure ExpressRoute
Data Migration Strategies - DistCP
| 17
● Use an External Appliance
○ AWS Snowball (50 or 80 TB of data)
○ AWS Snowmobile (20+ PB Data)
● Loads Data into S3
● Then copy Deltas from On-Prem to Cloud
Data Migration Strategies - External Appliance
| 18
● S3 Transfer Acceleration
● AWS Snowball (50 or 80 TB of data)
● AWS Snowmobile (20+ PB Data)
● AWS Direct Connect
● AWS DataSync
● AWS Transfer for SFTP
● Amazon Kinesis Firehose
● More
Data Migration Services
● Azure Data Factory
● Azure Database Migration Service
● Azure ExpressRoute
Amazon Azure
| 19
On-Prem Workloads
Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Daily Reporting
Analytics
Real-Time
MLTraining
Higher Utilization Lower Utilization
| 20
Cloud Native Workloads
| 21
● Batch
○ Amazon EMR (MapReduce, Hive, Pig, Spark), Amazon Redshift, AWS Glue
● Interactive
○ Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark, Hive)
● Stream
○ Amazon EMR (Spark Streaming, Flink, Storm), Amazon Kinesis Analytics
● AI
○ Amazon AI (Lex, Polly, ML, Rekognition), Amazon EMR (Spark ML), Deep Learning)
Workload Migration to be Cloud Native
Thank You!
| 22
Questions?
robert.sanders@clairvoyantsoft.co
m

More Related Content

What's hot (20)

PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
PPTX
Spark - Migration Story
Roman Chukh
 
PDF
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Databricks
 
PDF
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
PDF
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Databricks
 
PPTX
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
PDF
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
PDF
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Databricks
 
PDF
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
PDF
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
PDF
Cloud-native Semantic Layer on Data Lake
Databricks
 
PPTX
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
PDF
Build Real-Time Applications with Databricks Streaming
Databricks
 
PDF
Converging Database Transactions and Analytics
SingleStore
 
PDF
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
PPTX
Telco analytics at scale
datamantra
 
PDF
Accelerate Data Science Initiatives: Databricks & Privacera
Databricks
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Spark - Migration Story
Roman Chukh
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Databricks
 
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Databricks
 
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Databricks
 
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Cloud-native Semantic Layer on Data Lake
Databricks
 
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
Build Real-Time Applications with Databricks Streaming
Databricks
 
Converging Database Transactions and Analytics
SingleStore
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Telco analytics at scale
datamantra
 
Accelerate Data Science Initiatives: Databricks & Privacera
Databricks
 

Similar to Migrating Big Data Workloads to the Cloud (20)

PDF
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
GetInData
 
PDF
Accenture-Cloud-Data-Migration-POV-Final.pdf
Rajvir Kaushal
 
PDF
Elephants in the cloud or How to become cloud ready
GetInData
 
PDF
Elephants in the cloud or how to become cloud ready
Krzysztof Adamski
 
PDF
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
 
PPTX
Making the Cloud a Known Entity
Kellyn Pot'Vin-Gorman
 
PDF
Cloud Computing Best Practices
BluePiIT
 
PDF
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
DevOps.com
 
PDF
3 ways to efficiently migrate your big data to AWS cloud | LCloud
LCloud
 
PDF
Look Before You Leap: Migrating On-Premises Hadoop to AWS
DevOps.com
 
PDF
#DataOnCloud New York Event
HARMAN Services
 
PPTX
Migrating Legacy Applications to AWS Cloud: Strategies and Challenges
OSSCube
 
PPTX
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
DATAVERSITY
 
PPTX
The Cloud Talk
Kellyn Pot'Vin-Gorman
 
PPTX
Database Migrations to the Cloud
Kellyn Pot'Vin-Gorman
 
PPTX
Moving Your Data to The Cloud
Adwait Ullal
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PDF
Cloud cost optimization an essential guide to aws cloud migration
Katy Slemon
 
PDF
5 Points to Consider - Enterprise Road Map to AWS Cloud
Blazeclan Technologies Private Limited
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
GetInData
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Rajvir Kaushal
 
Elephants in the cloud or How to become cloud ready
GetInData
 
Elephants in the cloud or how to become cloud ready
Krzysztof Adamski
 
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Evention
 
Making the Cloud a Known Entity
Kellyn Pot'Vin-Gorman
 
Cloud Computing Best Practices
BluePiIT
 
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
DevOps.com
 
3 ways to efficiently migrate your big data to AWS cloud | LCloud
LCloud
 
Look Before You Leap: Migrating On-Premises Hadoop to AWS
DevOps.com
 
#DataOnCloud New York Event
HARMAN Services
 
Migrating Legacy Applications to AWS Cloud: Strategies and Challenges
OSSCube
 
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
DATAVERSITY
 
The Cloud Talk
Kellyn Pot'Vin-Gorman
 
Database Migrations to the Cloud
Kellyn Pot'Vin-Gorman
 
Moving Your Data to The Cloud
Adwait Ullal
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Cloud cost optimization an essential guide to aws cloud migration
Katy Slemon
 
5 Points to Consider - Enterprise Road Map to AWS Cloud
Blazeclan Technologies Private Limited
 
Ad

More from Robert Sanders (6)

PPTX
Delivering digital transformation and business impact with io t, machine lear...
Robert Sanders
 
PPTX
Productionalizing spark streaming applications
Robert Sanders
 
PPTX
Apache Airflow in Production
Robert Sanders
 
PPTX
Airflow Clustering and High Availability
Robert Sanders
 
PPTX
Databricks Community Cloud Overview
Robert Sanders
 
PPTX
Intro to Apache Spark
Robert Sanders
 
Delivering digital transformation and business impact with io t, machine lear...
Robert Sanders
 
Productionalizing spark streaming applications
Robert Sanders
 
Apache Airflow in Production
Robert Sanders
 
Airflow Clustering and High Availability
Robert Sanders
 
Databricks Community Cloud Overview
Robert Sanders
 
Intro to Apache Spark
Robert Sanders
 
Ad

Recently uploaded (20)

PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 

Migrating Big Data Workloads to the Cloud

  • 1. www.clairvoyantsoft.com Migrating Big Data Workloads to the Cloud By: Robert Sanders
  • 2. | 2 Robert Sanders Director of Big Data Engineering https://www.linkedin.com/in/robert-sanders-cs/ Robert Sanders is the Director of Big Data Engineering at Clairvoyant Insight. In his day job, Robert wears multiple hats and goes between leading members of the Insight development team to working directly with clients to Architecting and Engineering large scale Data projects. Robert has a deep background in enterprise systems, initially working on full-stack implementations and then focusing on building Data Management Platforms.
  • 3. | 3 About Clairvoyant Background Awards & Recognition Boutique consulting firm centered on building data solutions and products All things Web and Data Engineering, Analytics, ML and User Experience to bring it all together Support core Hadoop platform, data engineering pipelines and provide administrative and devops expertise focused on Hadoop
  • 4. | 4 ● “The Cloud” and “The Why” ● Cloud Migration Strategies ● Data Migration ● Application Migration Agenda
  • 5. | 5 Rented Infrastructure and Services The Cloud
  • 6. | 6 Reasons to Move to the Cloud ● Ability to make use of Cloud Services ● Reduce Cost ● Reduce Time to Spin Up new Instances ● Scalability
  • 7. | 7 ● Repurchase (Drop and Shop) ● Rehost (Lift and Shift) ● Replatform (Lift, Tinker and Shift) ● Refactor/Re-Architect ● Retiring ● Retaining General Migration Strategies - 6 R’s
  • 8. | 8 1. Forklift 2. Hybrid 3. Cloud Native Big Data Migration Strategies
  • 9. | 9 ● Rehost Option Forklift
  • 10. | 10 ● Rehost, Replatform and Refactor/Re-Architect Option Hybrid
  • 11. | 11 Options with Blob Storage
  • 12. | 12 ● Replatform and Refactor/Re-Architect Option Cloud Native
  • 13. | 13 Forklift Hybrid Cloud Native
  • 14. | 14 $ hadoop distcp hdfs://src_host/path/* hdfs://dest_host/path/ $ hadoop distcp hdfs://src_host/path/* s3a://dest_bucket/ Data Migration Strategies - DistCP
  • 15. | 15 File Transfer Rate Note: Estimates (assumes - 25% network overhead) 100 Mbps 1 Gbps 10 Gbps 1 TB 30 hrs 3 hrs 18 min 10 TB 12 days 30 hrs 3 hrs 100 TB 124 days 12 days 30 hrs 1 PB 3 years 124 days 12 days 10 PB 34 years 3 years 124 days Usable Network Bandwidth Data to Transfer
  • 16. | 16 ● Azure ExpressRoute Data Migration Strategies - DistCP
  • 17. | 17 ● Use an External Appliance ○ AWS Snowball (50 or 80 TB of data) ○ AWS Snowmobile (20+ PB Data) ● Loads Data into S3 ● Then copy Deltas from On-Prem to Cloud Data Migration Strategies - External Appliance
  • 18. | 18 ● S3 Transfer Acceleration ● AWS Snowball (50 or 80 TB of data) ● AWS Snowmobile (20+ PB Data) ● AWS Direct Connect ● AWS DataSync ● AWS Transfer for SFTP ● Amazon Kinesis Firehose ● More Data Migration Services ● Azure Data Factory ● Azure Database Migration Service ● Azure ExpressRoute Amazon Azure
  • 19. | 19 On-Prem Workloads Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Daily Reporting Analytics Real-Time MLTraining Higher Utilization Lower Utilization
  • 20. | 20 Cloud Native Workloads
  • 21. | 21 ● Batch ○ Amazon EMR (MapReduce, Hive, Pig, Spark), Amazon Redshift, AWS Glue ● Interactive ○ Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark, Hive) ● Stream ○ Amazon EMR (Spark Streaming, Flink, Storm), Amazon Kinesis Analytics ● AI ○ Amazon AI (Lex, Polly, ML, Rekognition), Amazon EMR (Spark ML), Deep Learning) Workload Migration to be Cloud Native