SlideShare a Scribd company logo
Data Pipelines With
Streamsets
Jowanza Joseph
@jowanza
Agenda
About me
The Problem Space
Streaming
StreamSets
Demo
Questions
About Me
Software Engineer at One ClickRetail
Scala / Spark / Mesos / Kubernetes
Author: Apache Spark Fieldbook
Cyclist
Husband and father
Data Pipelines With Streamsets
Retail Intelligence
Data Size
Real-Time
Operational Complexity
Data Pipelines With Streamsets
Batch Processing
What Are Data
Pipelines?
Data Pipelines With Streamsets
What Problems Do
They Solve?
Scalability
Complexity
Observability
Extendability
Lambda Architecture
Kappa Architecture
Data Pipelines With Streamsets
Goals
Data Provenance
Guaranteed Delivery
Configurable
Extendable
Multi-Protocol Support
DAG
Distribute
Data Pipelines With Streamsets
Based on Streams
Architecture
Running on Mesos
Analytics Data
Real-Time Data
Our Use Case
Demo

More Related Content

PPTX
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Kinetica
 
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Spark Summit EU talk by Pat Patterson
Spark Summit
 
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 
PDF
Simplify and Scale Data Engineering Pipelines with Delta Lake
Databricks
 
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Data Con LA
 
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Kinetica
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Spark Summit EU talk by Pat Patterson
Spark Summit
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Databricks
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Data Con LA
 

What's hot (20)

PPTX
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
Yahoo Developer Network
 
PDF
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
PDF
Delta Lake: Open Source Reliability w/ Apache Spark
George Chow
 
PDF
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
PDF
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
PDF
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 
PDF
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
PDF
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Databricks
 
PPTX
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Data Con LA
 
PPTX
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
PDF
Intro to databricks delta lake
Mykola Zerniuk
 
PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
PDF
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
PDF
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Vasu S
 
PPTX
Spark Streaming with Azure Databricks
Dustin Vannoy
 
PDF
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
PDF
Democratizing Data
Databricks
 
PDF
Migrating Your Data Platform At a High Growth Startup
Databricks
 
PPTX
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
Yahoo Developer Network
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
Delta Lake: Open Source Reliability w/ Apache Spark
George Chow
 
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 
Redash: Open Source SQL Analytics on Data Lakes
Databricks
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Databricks
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Data Con LA
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
Intro to databricks delta lake
Mykola Zerniuk
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Vasu S
 
Spark Streaming with Azure Databricks
Dustin Vannoy
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Democratizing Data
Databricks
 
Migrating Your Data Platform At a High Growth Startup
Databricks
 
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Ad