SlideShare a Scribd company logo
Streaming is a Detail
Current 2023
Amy Chen & Florian Eiden
1
2
Introductions
Amy Chen
Staff Partner Engineer
Fun fact: Iʼve made dbt soap 🧼
Florian Eiden
Staff Product Manager
Fun fact: I live on an 🏝
Pieces of the puzzle
- Transactional vs Analytical
- Analytics Engineering vs Data Engineering
- Personas
- ELT vs ETL
- Streaming in the ELT world (bring streaming to the database, or the other way around)
- Operational Analytics
- Why does it have to be Batch vs Streaming?
- Is streaming the biggest trend in analytics?
- What is the next big milestone for streaming?
- How do we solve CI/CD? Testing? Replayability?
- Is Flink a database now?
- Is pipeline the right way to bundle logic? What about DAGs?
- Logic: plumbing vs business logic
Streaming is a Detail
dbt is ELT
5
The dbt viewpoint:
Build data like
developers build
applications
6
dbt uses testing, version
control, reusable code, and
documentation to get to the
right answer, faster.
Work like engineers
Pairing code-based
transformation with your
favorite git provider means
flexibility without chaos.
Code Reigns
Write reusable & referenceable logic with SQL + Jinja
Infer lineage for automated dependency management.
9
SQL-friendly +
version control safely
expands participants
Reusable code
speeds development
Built-in CI/CD
increases pipeline
reliability
Dependency
management speeds
troubleshooting
01
Visible lineage
increases data
understanding
Testing and
documentation
increase data trust
02 03 04 05 06
Data Engineers,
Analysts, and Data
Scientists
Collaborative Code Dashboard A =
Dashboard B
Automatic
Documentation
Analysts and
Business Users
Leverage your
existing cloud data
platform, with
out-of-the-box
adapters to all
major warehouses.
Benefit from
partnerships
across the Modern
Data Stack.
10
Data Quality
Orchestration
Data Ingestion
Other
Cloud Data Platform
Analysis & Visualization
Data Catalog & Active Metadata
Operational Analytics
Develop Test &
Document
Deploy
MVs for everyone!
The questions on when to use MVs
What are the costs associated with running the materialized view versus a batched incremental model?
(this will vary depending on your data platform as some will require different compute nodes)
Does your data platform support joins, aggregations, and window functions on MVs if you need them?
What are the latency needs of your development environment? In production? (If not near real time, you
can make the choice between a batch incremental model or a MV with a longer refresh schedule.)
How often do your upstream dependencies update? If your answer is not frequent, you may not need a
MV.
How large is your dataset?(It might be cheaper to use MVs for extremely large datasets)
How often do you need your query refreshed? What are your downstream dependencies and their
stakeholders? (If near real time is important, MVs might be the right choice).
Do you have real time machine learning models training or applications using your transformed dataset?

More Related Content

PDF
Horses for Courses: Database Roundtable
Eric Kavanagh
 
PDF
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
PPS
Qo Introduction V2
Joe_F
 
PDF
Confluent Partner Tech Talk with Reply
confluent
 
PDF
Future of Data Strategy (ASEAN)
Denodo
 
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
PDF
Confluent Partner Tech Talk with BearingPoint
confluent
 
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
Qo Introduction V2
Joe_F
 
Confluent Partner Tech Talk with Reply
confluent
 
Future of Data Strategy (ASEAN)
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Confluent Partner Tech Talk with BearingPoint
confluent
 

Similar to Streaming is a Detail (20)

PDF
Big data analytics beyond beer and diapers
Kai Zhao
 
PDF
Data virtualization an introduction
Denodo
 
PDF
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Denodo
 
PDF
Best Income Tax Return Filing Software crack 2025
zafranwaqar90
 
PDF
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch139kp
 
PDF
IDM Crack 6.42 Build 31 Patch with Internet
blouch81kp
 
PDF
CAST Imaging: Map & Master Your Software
Neo4j
 
PDF
logic pro x crack FREE Download latest 2025
waqarcracker5
 
PDF
topaz photo ai crack FREE Download Latest Version 2025
waqarcracker5
 
PDF
TunesKit Video Cutter 3.0.0.54 Free Download
blouch111kp
 
PDF
Adobe After Effects Download (Latest 2025)
alihamzakpa098
 
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
PDF
The Growth Of Data Centers
Gina Buck
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
Why Data Virtualization? An Introduction
Denodo
 
PDF
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
PPTX
La creación de una capa operacional con MongoDB
MongoDB
 
PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Big data analytics beyond beer and diapers
Kai Zhao
 
Data virtualization an introduction
Denodo
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Denodo
 
Best Income Tax Return Filing Software crack 2025
zafranwaqar90
 
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch139kp
 
IDM Crack 6.42 Build 31 Patch with Internet
blouch81kp
 
CAST Imaging: Map & Master Your Software
Neo4j
 
logic pro x crack FREE Download latest 2025
waqarcracker5
 
topaz photo ai crack FREE Download Latest Version 2025
waqarcracker5
 
TunesKit Video Cutter 3.0.0.54 Free Download
blouch111kp
 
Adobe After Effects Download (Latest 2025)
alihamzakpa098
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
The Growth Of Data Centers
Gina Buck
 
Data Virtualization: An Introduction
Denodo
 
Why Data Virtualization? An Introduction
Denodo
 
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
La creación de una capa operacional con MongoDB
MongoDB
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Software Development Methodologies in 2025
KodekX
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Doc9.....................................
SofiaCollazos
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 

Streaming is a Detail

  • 1. Streaming is a Detail Current 2023 Amy Chen & Florian Eiden 1
  • 2. 2 Introductions Amy Chen Staff Partner Engineer Fun fact: Iʼve made dbt soap 🧼 Florian Eiden Staff Product Manager Fun fact: I live on an 🏝
  • 3. Pieces of the puzzle - Transactional vs Analytical - Analytics Engineering vs Data Engineering - Personas - ELT vs ETL - Streaming in the ELT world (bring streaming to the database, or the other way around) - Operational Analytics - Why does it have to be Batch vs Streaming? - Is streaming the biggest trend in analytics? - What is the next big milestone for streaming? - How do we solve CI/CD? Testing? Replayability? - Is Flink a database now? - Is pipeline the right way to bundle logic? What about DAGs? - Logic: plumbing vs business logic
  • 6. The dbt viewpoint: Build data like developers build applications 6 dbt uses testing, version control, reusable code, and documentation to get to the right answer, faster. Work like engineers Pairing code-based transformation with your favorite git provider means flexibility without chaos. Code Reigns
  • 7. Write reusable & referenceable logic with SQL + Jinja
  • 8. Infer lineage for automated dependency management.
  • 9. 9 SQL-friendly + version control safely expands participants Reusable code speeds development Built-in CI/CD increases pipeline reliability Dependency management speeds troubleshooting 01 Visible lineage increases data understanding Testing and documentation increase data trust 02 03 04 05 06 Data Engineers, Analysts, and Data Scientists Collaborative Code Dashboard A = Dashboard B Automatic Documentation Analysts and Business Users
  • 10. Leverage your existing cloud data platform, with out-of-the-box adapters to all major warehouses. Benefit from partnerships across the Modern Data Stack. 10 Data Quality Orchestration Data Ingestion Other Cloud Data Platform Analysis & Visualization Data Catalog & Active Metadata Operational Analytics Develop Test & Document Deploy
  • 12. The questions on when to use MVs What are the costs associated with running the materialized view versus a batched incremental model? (this will vary depending on your data platform as some will require different compute nodes) Does your data platform support joins, aggregations, and window functions on MVs if you need them? What are the latency needs of your development environment? In production? (If not near real time, you can make the choice between a batch incremental model or a MV with a longer refresh schedule.) How often do your upstream dependencies update? If your answer is not frequent, you may not need a MV. How large is your dataset?(It might be cheaper to use MVs for extremely large datasets) How often do you need your query refreshed? What are your downstream dependencies and their stakeholders? (If near real time is important, MVs might be the right choice). Do you have real time machine learning models training or applications using your transformed dataset?