Streaming is a Detail
Current 2023
Amy Chen & Florian Eiden
1
2
Introductions
Amy Chen
Staff Partner Engineer
Fun fact: Iʼve made dbt soap 🧼
Florian Eiden
Staff Product Manager
Fun fact: I live on an 🏝
Pieces of the puzzle
- Transactional vs Analytical
- Analytics Engineering vs Data Engineering
- Personas
- ELT vs ETL
- Streaming in the ELT world (bring streaming to the database, or the other way around)
- Operational Analytics
- Why does it have to be Batch vs Streaming?
- Is streaming the biggest trend in analytics?
- What is the next big milestone for streaming?
- How do we solve CI/CD? Testing? Replayability?
- Is Flink a database now?
- Is pipeline the right way to bundle logic? What about DAGs?
- Logic: plumbing vs business logic
Streaming is a Detail
dbt is ELT
5
The dbt viewpoint:
Build data like
developers build
applications
6
dbt uses testing, version
control, reusable code, and
documentation to get to the
right answer, faster.
Work like engineers
Pairing code-based
transformation with your
favorite git provider means
flexibility without chaos.
Code Reigns
Write reusable & referenceable logic with SQL + Jinja
Infer lineage for automated dependency management.
9
SQL-friendly +
version control safely
expands participants
Reusable code
speeds development
Built-in CI/CD
increases pipeline
reliability
Dependency
management speeds
troubleshooting
01
Visible lineage
increases data
understanding
Testing and
documentation
increase data trust
02 03 04 05 06
Data Engineers,
Analysts, and Data
Scientists
Collaborative Code Dashboard A =
Dashboard B
Automatic
Documentation
Analysts and
Business Users
Leverage your
existing cloud data
platform, with
out-of-the-box
adapters to all
major warehouses.
Benefit from
partnerships
across the Modern
Data Stack.
10
Data Quality
Orchestration
Data Ingestion
Other
Cloud Data Platform
Analysis & Visualization
Data Catalog & Active Metadata
Operational Analytics
Develop Test &
Document
Deploy
MVs for everyone!
The questions on when to use MVs
What are the costs associated with running the materialized view versus a batched incremental model?
(this will vary depending on your data platform as some will require different compute nodes)
Does your data platform support joins, aggregations, and window functions on MVs if you need them?
What are the latency needs of your development environment? In production? (If not near real time, you
can make the choice between a batch incremental model or a MV with a longer refresh schedule.)
How often do your upstream dependencies update? If your answer is not frequent, you may not need a
MV.
How large is your dataset?(It might be cheaper to use MVs for extremely large datasets)
How often do you need your query refreshed? What are your downstream dependencies and their
stakeholders? (If near real time is important, MVs might be the right choice).
Do you have real time machine learning models training or applications using your transformed dataset?

More Related Content

PDF
Horses for Courses: Database Roundtable
PDF
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
PPS
Qo Introduction V2
PDF
Confluent Partner Tech Talk with Reply
PDF
Future of Data Strategy (ASEAN)
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
PDF
Confluent Partner Tech Talk with BearingPoint
Horses for Courses: Database Roundtable
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Qo Introduction V2
Confluent Partner Tech Talk with Reply
Future of Data Strategy (ASEAN)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Confluent Partner Tech Talk with BearingPoint

Similar to Streaming is a Detail (20)

PDF
Big data analytics beyond beer and diapers
PDF
Data virtualization an introduction
PDF
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
PDF
CAST Imaging: Map & Master Your Software
PDF
Best Income Tax Return Filing Software crack 2025
PDF
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
PDF
IDM Crack 6.42 Build 31 Patch with Internet
PDF
logic pro x crack FREE Download latest 2025
PDF
topaz photo ai crack FREE Download Latest Version 2025
PDF
TunesKit Video Cutter 3.0.0.54 Free Download
PDF
Adobe After Effects Download (Latest 2025)
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
PDF
The Growth Of Data Centers
PDF
Data Virtualization: An Introduction
PDF
Why Data Virtualization? An Introduction
PDF
Overcoming Today's Data Challenges with MongoDB
PPTX
La creación de una capa operacional con MongoDB
PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Big data analytics beyond beer and diapers
Data virtualization an introduction
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
CAST Imaging: Map & Master Your Software
Best Income Tax Return Filing Software crack 2025
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
IDM Crack 6.42 Build 31 Patch with Internet
logic pro x crack FREE Download latest 2025
topaz photo ai crack FREE Download Latest Version 2025
TunesKit Video Cutter 3.0.0.54 Free Download
Adobe After Effects Download (Latest 2025)
Data Engineer's Lunch #85: Designing a Modern Data Stack
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
The Growth Of Data Centers
Data Virtualization: An Introduction
Why Data Virtualization? An Introduction
Overcoming Today's Data Challenges with MongoDB
La creación de una capa operacional con MongoDB
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Ad

Recently uploaded (20)

PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
future_of_ai_comprehensive_20250822032121.pptx
Comparative analysis of machine learning models for fake news detection in so...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
Module 1 Introduction to Web Programming .pptx
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Connector Corner: Transform Unstructured Documents with Agentic Automation
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
4 layer Arch & Reference Arch of IoT.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...

Streaming is a Detail

  • 1. Streaming is a Detail Current 2023 Amy Chen & Florian Eiden 1
  • 2. 2 Introductions Amy Chen Staff Partner Engineer Fun fact: Iʼve made dbt soap 🧼 Florian Eiden Staff Product Manager Fun fact: I live on an 🏝
  • 3. Pieces of the puzzle - Transactional vs Analytical - Analytics Engineering vs Data Engineering - Personas - ELT vs ETL - Streaming in the ELT world (bring streaming to the database, or the other way around) - Operational Analytics - Why does it have to be Batch vs Streaming? - Is streaming the biggest trend in analytics? - What is the next big milestone for streaming? - How do we solve CI/CD? Testing? Replayability? - Is Flink a database now? - Is pipeline the right way to bundle logic? What about DAGs? - Logic: plumbing vs business logic
  • 6. The dbt viewpoint: Build data like developers build applications 6 dbt uses testing, version control, reusable code, and documentation to get to the right answer, faster. Work like engineers Pairing code-based transformation with your favorite git provider means flexibility without chaos. Code Reigns
  • 7. Write reusable & referenceable logic with SQL + Jinja
  • 8. Infer lineage for automated dependency management.
  • 9. 9 SQL-friendly + version control safely expands participants Reusable code speeds development Built-in CI/CD increases pipeline reliability Dependency management speeds troubleshooting 01 Visible lineage increases data understanding Testing and documentation increase data trust 02 03 04 05 06 Data Engineers, Analysts, and Data Scientists Collaborative Code Dashboard A = Dashboard B Automatic Documentation Analysts and Business Users
  • 10. Leverage your existing cloud data platform, with out-of-the-box adapters to all major warehouses. Benefit from partnerships across the Modern Data Stack. 10 Data Quality Orchestration Data Ingestion Other Cloud Data Platform Analysis & Visualization Data Catalog & Active Metadata Operational Analytics Develop Test & Document Deploy
  • 12. The questions on when to use MVs What are the costs associated with running the materialized view versus a batched incremental model? (this will vary depending on your data platform as some will require different compute nodes) Does your data platform support joins, aggregations, and window functions on MVs if you need them? What are the latency needs of your development environment? In production? (If not near real time, you can make the choice between a batch incremental model or a MV with a longer refresh schedule.) How often do your upstream dependencies update? If your answer is not frequent, you may not need a MV. How large is your dataset?(It might be cheaper to use MVs for extremely large datasets) How often do you need your query refreshed? What are your downstream dependencies and their stakeholders? (If near real time is important, MVs might be the right choice). Do you have real time machine learning models training or applications using your transformed dataset?