SlideShare a Scribd company logo
13
Most read
16
Most read
18
Most read
Modernizing to a
cloud data
architecture
Guido Oswald, Solutions Architect, Databricks
Matt Graves, VP of Enterprise Data & Analytics,
GCI Communication Corp
Agenda
‱ Top reasons to modernize from Hadoop to Databricks
‱ Success stories, technical and business benefits
‱ Fast migrations with low costs & low risk
‱ Fireside Chat: Matt Graves
Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries

Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries

The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Digital transformation is accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries

The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Source: Gartner cited by Battery Ventures - Open Cloud report
Cloud adoption is
accelerating by $100B
from 2021 - 2023
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming
Data Science & Machine
Learning
Extract
Transform
Streaming data sources
Streaming Data Engine
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics
Tibco Spotfire
Google Dataflow
Confluent
Disconnected systems and proprietary data formats make integration difficult
Data
Scientists
Data
Engineers
Data
Analysts
Data
Engineers
Siloed data teams decrease productivity
Load Real-time Database
Is your architecture enabling growth?
Legacy on-premise data and analytics architectures are not keeping up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
Hadoop is costly, complex and ineffective
Hadoop ecosystem is complex,
hard to manage, and prone to
failures
24/7 HDFS clusters that need
to built for peak usage and are
costly to upgrade
‱ RIGID AND INELASTIC
‱ DEVOPS INTENSIVE
No out-of-box support for
ML/AI and separate data and AI
environments
‱ LACKS AI CAPABILITIES
Low Productivity Cost Prohibitive Slow Innovation
X
Enterprises need a modern data and analytics
architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive
innovation
Modernization delivers business value
Forrester TEI study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
The Databricks Lakehouse Platform is one simple platform to unify all
your data, analytics, and AI workloads
Original creators of popular data and machine learning open-source projects
Global company with over 5,000 customers and more than 450 partners
Data
Warehouse
Lakehouse
One platform to unify all of
your data, analytics, and AI workloads
Data
Lake
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine
Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
From BI to AI
All your data,
analytics and
AI on one
Lakehouse
platform
Data Eng, ML
(Spark)
Scalable apps on Columnar store
(Hbase)
ETL, SQL
(Hive/ Impala)
Databricks jobs / Delta Lake / SparkSQL
(Highly tuned Spark engine: faster, less compute, one-stop-shop)
Batch Process
(MapReduce)
Real-time Event Processing
(Storm/ Spark)
Databricks Spark jobs
(orders of magnitude faster - but may need manual work)
Databricks Structured Streaming
(Spark Structured Streaming + Delta Lake: Streaming + Batch ingest)
Databricks jobs/ Delta Lake
(Highly tuned Spark engine: faster, less compute, one-stop-
shop)
Databricks Spark integrates w/ HBase on cloud
(Alternatively: use cloud data stores well integrated with Databricks)
Technology mapping: deliver better outcomes
Automation for most workload types
Data Migration
Metastore Migration
SQL Migration
Security
Scheduled Data pulls
Orchestration
HDFS
Hive Databases / Tables / Views
Impala Databases / Tables/ Views
HDFS
Hive Queries
Spark Queries
Sentry permissions /Ranger policies
HDFS access permissions
Sqoop statements
Oozie Jobs
Azure ADLS Gen 2, AWS S3, GCS
Databricks Tables
Databricks Tables
Spark Sql Databricks Notebooks
Spark Sql Databricks Notebooks
Databricks Notebooks
Databricks permissions
AWS IAM, ADLS ACLs
Databricks compatible PySpark code
Airflow DAGs & Databricks Jobs
55-66 % reduction in costs and 2-3x reduction in
timelines by using automation tools
Data Migration
Assessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Same tool
used for pre-
migration
Assessment
Our partner ecosystem accelerates migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
Modernization with Databricks - recap
Why - costs, productivity, innovation → business
impact
Your competitors and market leaders are doing it
NOW
Databricks experts and automation strategy can
help you migrate faster, with much lower cost and
risk
Visit databricks.com/migration to learn more
Fireside chat
Matt Graves
VP of Enterprise Data & Analytics
GCI Communication Corp
Backup
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot (20)

PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Introducing Databricks Delta
Databricks
 
PDF
Databricks Delta Lake and Its Benefits
Databricks
 
PDF
adb.pdf
AdityaMehta724216
 
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
PDF
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
PDF
Data Mesh
Piethein Strengholt
 
PDF
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
PDF
Intro to Delta Lake
Databricks
 
PDF
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PPTX
Building a modern data warehouse
James Serra
 
PDF
Data Catalog for Better Data Discovery and Governance
Denodo
 
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
PPTX
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PDF
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Introducing Databricks Delta
Databricks
 
Databricks Delta Lake and Its Benefits
Databricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Mesh
Piethein Strengholt
 
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Intro to Delta Lake
Databricks
 
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Building a modern data warehouse
James Serra
 
Data Catalog for Better Data Discovery and Governance
Denodo
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Getting Started with Delta Lake on Databricks
Knoldus Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 

Similar to Modernizing to a Cloud Data Architecture (20)

PDF
The Hidden Value of Hadoop Migration
Databricks
 
PPTX
Migration to Databricks - On-prem HDFS.pptx
Kshitija(KJ) Gupte
 
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
PDF
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
PPTX
Liberate Legacy Data Sources with Precisely and Databricks
Precisely
 
PPTX
Databricks on AWS.pptx
Wasm1953
 
PDF
Master Databricks with AccentFuture – Online Training
Accentfuture
 
PPTX
Accelerate Innovation with Databricks and Legacy Data
Precisely
 
PDF
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
PPTX
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
PPTX
Introduction to Databricks - AccentFuture
Accentfuture
 
PPTX
Introduction_to_Databricks_power_point_presentation.pptx
xeranaw566
 
PDF
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
XIAOZEJIN1
 
PDF
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
HostedbyConfluent
 
PDF
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis
 
PDF
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
PDF
Architecting Agile Data Applications for Scale
Databricks
 
PPTX
DataBricks fundamentals for fresh graduates
SanjeevaniClinicalRe
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PPTX
TechEvent Databricks on Azure
Trivadis
 
The Hidden Value of Hadoop Migration
Databricks
 
Migration to Databricks - On-prem HDFS.pptx
Kshitija(KJ) Gupte
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Precisely
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
Liberate Legacy Data Sources with Precisely and Databricks
Precisely
 
Databricks on AWS.pptx
Wasm1953
 
Master Databricks with AccentFuture – Online Training
Accentfuture
 
Accelerate Innovation with Databricks and Legacy Data
Precisely
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Introduction to Databricks - AccentFuture
Accentfuture
 
Introduction_to_Databricks_power_point_presentation.pptx
xeranaw566
 
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
XIAOZEJIN1
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
HostedbyConfluent
 
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis
 
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
Architecting Agile Data Applications for Scale
Databricks
 
DataBricks fundamentals for fresh graduates
SanjeevaniClinicalRe
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
TechEvent Databricks on Azure
Trivadis
 
Ad

More from Databricks (20)

PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
PDF
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
PDF
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
Ad

Recently uploaded (20)

PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 

Modernizing to a Cloud Data Architecture

  • 1. Modernizing to a cloud data architecture Guido Oswald, Solutions Architect, Databricks Matt Graves, VP of Enterprise Data & Analytics, GCI Communication Corp
  • 2. Agenda ‱ Top reasons to modernize from Hadoop to Databricks ‱ Success stories, technical and business benefits ‱ Fast migrations with low costs & low risk ‱ Fireside Chat: Matt Graves
  • 3. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries

  • 4. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries
 The data surge is placing tremendous pressure on traditional data and analytics infrastructure
  • 5. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries
 The data surge is placing tremendous pressure on traditional data and analytics infrastructure Source: Gartner cited by Battery Ventures - Open Cloud report Cloud adoption is accelerating by $100B from 2021 - 2023
  • 6. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Transform Streaming data sources Streaming Data Engine Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Tibco Spotfire Google Dataflow Confluent Disconnected systems and proprietary data formats make integration difficult Data Scientists Data Engineers Data Analysts Data Engineers Siloed data teams decrease productivity Load Real-time Database
  • 7. Is your architecture enabling growth? Legacy on-premise data and analytics architectures are not keeping up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data
  • 8. Hadoop is costly, complex and ineffective Hadoop ecosystem is complex, hard to manage, and prone to failures 24/7 HDFS clusters that need to built for peak usage and are costly to upgrade ‱ RIGID AND INELASTIC ‱ DEVOPS INTENSIVE No out-of-box support for ML/AI and separate data and AI environments ‱ LACKS AI CAPABILITIES Low Productivity Cost Prohibitive Slow Innovation X
  • 9. Enterprises need a modern data and analytics architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  • 10. Modernization delivers business value Forrester TEI study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
  • 11. The Databricks Lakehouse Platform is one simple platform to unify all your data, analytics, and AI workloads Original creators of popular data and machine learning open-source projects Global company with over 5,000 customers and more than 450 partners
  • 12. Data Warehouse Lakehouse One platform to unify all of your data, analytics, and AI workloads Data Lake
  • 13. Structured Semi-structured Unstructured Streaming Lakehouse Platform Data Engineering BI & SQL Analytics Real-time Data Applications Data Science & Machine Learning Data Management & Governance Open Data Lake SIMPLE OPEN COLLABORATIVE From BI to AI All your data, analytics and AI on one Lakehouse platform
  • 14. Data Eng, ML (Spark) Scalable apps on Columnar store (Hbase) ETL, SQL (Hive/ Impala) Databricks jobs / Delta Lake / SparkSQL (Highly tuned Spark engine: faster, less compute, one-stop-shop) Batch Process (MapReduce) Real-time Event Processing (Storm/ Spark) Databricks Spark jobs (orders of magnitude faster - but may need manual work) Databricks Structured Streaming (Spark Structured Streaming + Delta Lake: Streaming + Batch ingest) Databricks jobs/ Delta Lake (Highly tuned Spark engine: faster, less compute, one-stop- shop) Databricks Spark integrates w/ HBase on cloud (Alternatively: use cloud data stores well integrated with Databricks) Technology mapping: deliver better outcomes
  • 15. Automation for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3, GCS Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  • 16. 55-66 % reduction in costs and 2-3x reduction in timelines by using automation tools Data Migration Assessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered Same tool used for pre- migration Assessment
  • 17. Our partner ecosystem accelerates migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  • 18. Modernization with Databricks - recap Why - costs, productivity, innovation → business impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  • 20. Fireside chat Matt Graves VP of Enterprise Data & Analytics GCI Communication Corp
  • 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.