SlideShare a Scribd company logo
Copyright © 2019 Impetus Technologies, Inc.
You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or
re-encoding of this content without the prior written consent of Impetus.
This presentation may include images from other products and services. These images are used for
illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of
these products by Impetus. All copyrights and trademarks are property of their respective owners.
Build Spark-based ETL Workflows on Cloud
in Minutes
Punit Shah
Senior Solutions Architect
Saurabh Dutta
Product Manager
Our Mission
Enabling a unified, clear and present view of
your business.
Poll
Our speakers
Punit Shah
Senior Solutions Architect
Saurabh Dutta
Product Manager
Agenda
Background of ETL and cloud
Features and benefits of cloud
Must-haves of a cloud ETL platform
Challenges with ETL on cloud
Demonstrate an ETL use case on cloud using StreamAnalytix
Almost three-quarters of businesses now operate partially
or fully in the cloud, and that number will exceed 90
percent by 2020.
– International Data Group, 2018
Source: https://www.idg.com/tools-for-marketers/2018-cloud-computing-survey/
One of the biggest shifts in the data integration market is
customers asking for hybrid deployment (cloud and on-
premises), with the expectation of multi-cloud and
cloud-to-cloud integration.
– Gartner, 2018
https://www.gartner.com/en/documents/3883264
Traditional ETL
Load
Store data into a data
warehouse or data mart
Extract
Get data from
the source
Transform
Filter/map/enrich/
combine/validate/sort
Why move ETL to cloud?
Elasticity
On-premise cluster Elastic cluster
Cloud native
ETL
S3
RDBMS
Azure Blob
store
Azure DW
Google cloud
bucket
Redshift
IaaS - Infrastructure as a service
ETL Job 1
High memory cluster
Cloud Infrastructure
High CPU cluster
ETL Job 2 ETL Job 3
GPU-based cluster
Total cost of ownership
On-premise infrastructure On-demand cloud infrastructure
However, are these capabilities enough?
StreamAnalytix combines the must-haves of a
modern cloud solution with the ability to
simplify and accelerate application
development.
The StreamAnalytix advantage
Graphical equivalent of an on-prem
solution, in the cloud
Build, deploy, and monitor workloads in a
single platform
Migrate existing ETL processes to cloud
visually
Deploy in a hybrid or a multi-cloud
environment
Get higher productivity
Demo
Upselling customers with targeted plans
Process last one month of CDR data
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Apply data quality checks
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Apply data quality checks
Combine with customer billing database
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Apply data quality checks
Combine with customer billing database
Group customers based on certain attributes
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Apply data quality checks
Combine with customer billing database
Group customers based on certain attributes
Analyze customers’ change in call pattern
Upselling customers with targeted plans
Process last one month of CDR data
Enrich it with cell tower information
Apply data quality checks
Combine with customer billing database
Group customers based on certain attributes
Analyze customers’ change in call pattern
Store processed data
A view into a typical ETL on cloud
development process
Challenges with the manual approach
Skilled resources
Time-to-market
Cluster utilization
Orchestration
Maintenance nightmare
Steps in the StreamAnalytix demo
Connecting to
multiple sources
Data quality
management
Data
preparation
Enrichment Standardization
and unification
Storage
StreamAnalytix — A recap
Unified data 360 platform
Based on open source
Operationalizes ML workflows
Application lifecycle management
Enterprise grade
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in Minutes
Poll results
Q&A
Thank you
Visit www.StreamAnalytix.com for a free download or
cloud-based trial or contact us at inquiry@streamanalytix.com
for more information or a PoC.

More Related Content

What's hot (17)

PPTX
EMEA Tech Summit Dublin - Winning with SolidFire
NetApp
 
PDF
Technical Due Diligence with AWS
Tom Laszewski
 
PDF
Intergen Think! Seminar: Assesing your fit for ERP cloud
Intergen
 
PPTX
Migrating to cloud case study
Sadam Hussain Khan
 
PDF
Dynamics 365 for Project Service Automation - Profit from Your Projects
David J Rosenthal
 
PDF
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp
 
PDF
U supdates sap implementation_ benefits of an enduring cloud erp solution
HarryJake1
 
PDF
Next Generation Data Center
NetApp
 
PDF
Apervi Basic Overview - Aug 2015
Kregg Ray
 
PDF
ERP Cloud: Assessing Readiness and Building the Roadmap
Capgemini
 
PDF
AWS Private Equity Transformation Advisory
Tom Laszewski
 
PPTX
Webinar: Are your Manufacturing apps ready to evolve to the Cloud?
Jade Global
 
PPTX
SwiftAnt EDI Services_Microsoft Gold Partner
Venkat Santhosh Subramanian
 
PDF
Software Product Engineering Services | Digital Transformation
Skizzle Technolabs
 
PDF
Transform and Bridge the Digital Disconnect with SAP Solutions
Capgemini
 
PPTX
Business cases for the need of cloud computing
Dr.Neeraj Kumar Pandey
 
PDF
4 Steps on how to Integrate NetApp OnCommand Insight with ServiceNow Configur...
NetApp
 
EMEA Tech Summit Dublin - Winning with SolidFire
NetApp
 
Technical Due Diligence with AWS
Tom Laszewski
 
Intergen Think! Seminar: Assesing your fit for ERP cloud
Intergen
 
Migrating to cloud case study
Sadam Hussain Khan
 
Dynamics 365 for Project Service Automation - Profit from Your Projects
David J Rosenthal
 
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp
 
U supdates sap implementation_ benefits of an enduring cloud erp solution
HarryJake1
 
Next Generation Data Center
NetApp
 
Apervi Basic Overview - Aug 2015
Kregg Ray
 
ERP Cloud: Assessing Readiness and Building the Roadmap
Capgemini
 
AWS Private Equity Transformation Advisory
Tom Laszewski
 
Webinar: Are your Manufacturing apps ready to evolve to the Cloud?
Jade Global
 
SwiftAnt EDI Services_Microsoft Gold Partner
Venkat Santhosh Subramanian
 
Software Product Engineering Services | Digital Transformation
Skizzle Technolabs
 
Transform and Bridge the Digital Disconnect with SAP Solutions
Capgemini
 
Business cases for the need of cloud computing
Dr.Neeraj Kumar Pandey
 
4 Steps on how to Integrate NetApp OnCommand Insight with ServiceNow Configur...
NetApp
 

Similar to Build Spark-based ETL Workflows on Cloud in Minutes (20)

PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PPTX
IdealNet Data Integration ETL vs Cloud
cbiddle2
 
PDF
Real time call senter monitering
Stream Analytix
 
PDF
Big Data LDN 2018: STREAMING INTEGRATION: POWERING HYBRID CLOUD, MACHINE LEAR...
Matt Stubbs
 
PPTX
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 
PPTX
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
PDF
How Evolven Blended Analytics Is Helping to Transform IT Efficiency and Value
Evolven Software
 
PDF
StreamHorizon overview
StreamHorizon
 
PDF
Making Multicloud Application Integration More Efficient
Cognizant
 
PDF
cStor-Xentaurs-CloudSolutions
Craig Richardson
 
PDF
Cloud Ready Data: Speeding Your Journey to the Cloud
DLT Solutions
 
PDF
Drive More Value with High Performance Cloud Data Warehousing
Enterprise Management Associates
 
PPTX
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
PDF
The Shifting Landscape of Data Integration
DATAVERSITY
 
PPTX
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera, Inc.
 
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Precisely
 
PDF
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 
PPT
Go Beyond the limits of Spreadsheets
Sonum International
 
PDF
TCS Cloud Plus GM
Nelson D'souza (LION)
 
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
Data Warehouse Optimization
Cloudera, Inc.
 
IdealNet Data Integration ETL vs Cloud
cbiddle2
 
Real time call senter monitering
Stream Analytix
 
Big Data LDN 2018: STREAMING INTEGRATION: POWERING HYBRID CLOUD, MACHINE LEAR...
Matt Stubbs
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Precisely
 
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
How Evolven Blended Analytics Is Helping to Transform IT Efficiency and Value
Evolven Software
 
StreamHorizon overview
StreamHorizon
 
Making Multicloud Application Integration More Efficient
Cognizant
 
cStor-Xentaurs-CloudSolutions
Craig Richardson
 
Cloud Ready Data: Speeding Your Journey to the Cloud
DLT Solutions
 
Drive More Value with High Performance Cloud Data Warehousing
Enterprise Management Associates
 
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
The Shifting Landscape of Data Integration
DATAVERSITY
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera, Inc.
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Precisely
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 
Go Beyond the limits of Spreadsheets
Sonum International
 
TCS Cloud Plus GM
Nelson D'souza (LION)
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
Ad

More from Impetus Technologies (17)

PPTX
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
PPTX
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
PPTX
Building a mature foundation for life in the cloud
Impetus Technologies
 
PPTX
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
PPTX
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
PPTX
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
PPTX
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
PPTX
Anomaly detection with machine learning at scale
Impetus Technologies
 
PPTX
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
PPTX
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
PPTX
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
PPTX
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
PPTX
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
PPTX
Importance of Big Data Analytics
Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
Building a mature foundation for life in the cloud
Impetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
Anomaly detection with machine learning at scale
Impetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
Importance of Big Data Analytics
Impetus Technologies
 
Ad

Recently uploaded (20)

PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 

Build Spark-based ETL Workflows on Cloud in Minutes

  • 1. Copyright © 2019 Impetus Technologies, Inc. You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or re-encoding of this content without the prior written consent of Impetus. This presentation may include images from other products and services. These images are used for illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of these products by Impetus. All copyrights and trademarks are property of their respective owners.
  • 2. Build Spark-based ETL Workflows on Cloud in Minutes Punit Shah Senior Solutions Architect Saurabh Dutta Product Manager
  • 3. Our Mission Enabling a unified, clear and present view of your business.
  • 5. Our speakers Punit Shah Senior Solutions Architect Saurabh Dutta Product Manager
  • 6. Agenda Background of ETL and cloud Features and benefits of cloud Must-haves of a cloud ETL platform Challenges with ETL on cloud Demonstrate an ETL use case on cloud using StreamAnalytix
  • 7. Almost three-quarters of businesses now operate partially or fully in the cloud, and that number will exceed 90 percent by 2020. – International Data Group, 2018 Source: https://www.idg.com/tools-for-marketers/2018-cloud-computing-survey/
  • 8. One of the biggest shifts in the data integration market is customers asking for hybrid deployment (cloud and on- premises), with the expectation of multi-cloud and cloud-to-cloud integration. – Gartner, 2018 https://www.gartner.com/en/documents/3883264
  • 9. Traditional ETL Load Store data into a data warehouse or data mart Extract Get data from the source Transform Filter/map/enrich/ combine/validate/sort
  • 10. Why move ETL to cloud?
  • 12. Cloud native ETL S3 RDBMS Azure Blob store Azure DW Google cloud bucket Redshift
  • 13. IaaS - Infrastructure as a service ETL Job 1 High memory cluster Cloud Infrastructure High CPU cluster ETL Job 2 ETL Job 3 GPU-based cluster
  • 14. Total cost of ownership On-premise infrastructure On-demand cloud infrastructure
  • 15. However, are these capabilities enough?
  • 16. StreamAnalytix combines the must-haves of a modern cloud solution with the ability to simplify and accelerate application development.
  • 17. The StreamAnalytix advantage Graphical equivalent of an on-prem solution, in the cloud Build, deploy, and monitor workloads in a single platform Migrate existing ETL processes to cloud visually Deploy in a hybrid or a multi-cloud environment Get higher productivity
  • 18. Demo
  • 19. Upselling customers with targeted plans Process last one month of CDR data
  • 20. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information
  • 21. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information Apply data quality checks
  • 22. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information Apply data quality checks Combine with customer billing database
  • 23. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information Apply data quality checks Combine with customer billing database Group customers based on certain attributes
  • 24. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information Apply data quality checks Combine with customer billing database Group customers based on certain attributes Analyze customers’ change in call pattern
  • 25. Upselling customers with targeted plans Process last one month of CDR data Enrich it with cell tower information Apply data quality checks Combine with customer billing database Group customers based on certain attributes Analyze customers’ change in call pattern Store processed data
  • 26. A view into a typical ETL on cloud development process
  • 27. Challenges with the manual approach Skilled resources Time-to-market Cluster utilization Orchestration Maintenance nightmare
  • 28. Steps in the StreamAnalytix demo Connecting to multiple sources Data quality management Data preparation Enrichment Standardization and unification Storage
  • 29. StreamAnalytix — A recap Unified data 360 platform Based on open source Operationalizes ML workflows Application lifecycle management Enterprise grade
  • 33. Q&A
  • 34. Thank you Visit www.StreamAnalytix.com for a free download or cloud-based trial or contact us at [email protected] for more information or a PoC.

Editor's Notes

  • #11: Punit to talk about how must haves alone do not provide the time to market advantage, which is a big challenge in a resource intensive process like ETL. We need features beyond these must haves that can make the ETL process easier faster and far less cumbersome compared than current a pproaches agility
  • #12: Punit to talk about how must haves alone do not provide the time to market advantage, which is a big challenge in a resource intensive process like ETL. We need features beyond these must haves that can make the ETL process easier faster and far less cumbersome compared than current a pproaches agility
  • #13: Punit to talk about how must haves alone do not provide the time to market advantage, which is a big challenge in a resource intensive process like ETL. We need features beyond these must haves that can make the ETL process easier faster and far less cumbersome compared than current a pproaches agility
  • #16: Punit to talk about how must haves alone do not provide the time to market advantage, which is a big challenge in a resource intensive process like ETL. We need features beyond these must haves that can make the ETL process easier faster and far less cumbersome compared than current a pproaches agility
  • #18: Add a screenshot of the canvas and show bullets animating on the right side