SlideShare a Scribd company logo
Streaming Data Pipelines with
Apache Beam
Danny McCormick
Agenda
● Who am I
● What is Apache Beam
● Beam Basics
● Processing streaming data
● Demo
Who am I
Me!
What is Apache Beam
In the beginning, there was MapReduce
Datastore
Map
Map
Map
Map
Map
Map
Reduce
Reduce
Reduce
Reduce
Reduce
Reduce
Shuffle
Datastore
In the beginning, there was MapReduce
Then came Flume (and Spark, Flink, and many more)
Datastore
Map
Map
Datastore
Map
Group by Key
(Reduce)
Combine
Map
Map
Combine
Datastore
Datastore
Datastore
From Flume came Beam
Datastore
Map
Map
Datastore
Map
Group by Key
(Reduce)
Combine
Map
Map
Combine
Datastore
Datastore
Datastore
Unified Model for Batch and Streaming
● Batch processing is a special case of
stream processing
● Batch + Stream = Beam
Build your pipeline in whatever language(s) you want…
Group by Key
… with whatever execution engine you want
Cloud Dataflow
Apache Spark
Apache Flink
Apache Apex
Gearpump
Apache Samza
Apache Nemo
(incubating)
IBM Streams
Group by Key
Beam Basics
Terms
● PCollection - distributed multi-element
dataset
● Transform - operation that takes N
PCollections and produces M PCollections
● Pipeline - directed acyclic graph of
Transforms and PCollections
Basic Beam Graph
Source
Transform
Sink
Transform
Source
Transform
Map
Transform
Combine
Transform
Sink
Transform
Sink
Transform
Basic Beam Pipeline
def add_one(element):
return element + 1
import apache_beam as beam
with beam.Pipeline() as pipeline:
pipeline
| beam.io.ReadFromText('gs://some/inputData.txt')
| beam.Map(add_one)
| beam.io.WriteToText('gs://some/outputData')
Read Text
file
Map
Transform
Write to text
file
How to use Beam to process huge
amounts of streaming data
We want to go from this:
To this:
Monday
Tuesday
Wednesday
Thursday
Friday
To this:
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Streaming data might be:
● Late
● Incomplete
● Rate limited
● Infinite
You will need to make tradeoffs between:
● Cost
● Completeness
● Low Latency
Example 1: Billing Pipeline
Completeness Low Latency Low Cost
Important
Not Important
Example 2: Billing Estimator
Completeness Low Latency Low Cost
`
Important
Not Important
Example 3: Fraud Detection
Completeness Low Latency Low Cost
`
Important
Not Important
Windows
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Aggregate or output Aggregate or output Aggregate or output Aggregate or outpu
output
Fixed Windows
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Aggregate
or output
Aggregate
or output
Aggregate
or output
Aggregate
or output
Aggregate
or output
Aggregate
or output
Aggregate
or output
ggregate
r output
Sliding Windows
Aggregate or output Aggregate or output Aggregate or output
Aggregate
or
output Aggregate
or
output Aggregate
or
output output
Agg
Aggregate or output
Sliding Windows
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Aggregate or output
Aggregate or output
ate or output
Aggregate or o
Aggregate or output
Aggregate or output
Aggregate or output
Aggregate or output
Session Windows
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Aggregate or
output
Aggregate or output
Aggregate or
output
Aggregate or output
A
Global Window
9:00
8:00 14:00
13:00
12:00
11:00
10:00
Code
● items | beam.WindowInto(window.FixedWindows(60)) # 60s fixed windows
● items | beam.WindowInto(window.SlidingWindows(30, 5)) # 30s sliding window every 5s
● items | beam.WindowInto(window.Sessions(10 * 60)) # window breaks after 10 empty min
● items | beam.WindowInto(window.GlobalWindows()) # single global window
Real Time vs Event Time - Expectation
Event Time
Processing
Time
Real Time vs Event Time - Reality
Event Time
Processing
Time
How do we know its safe to finish a window’s work?
Event Time
Processing
Time
Processing Time?
Event Time
Processing
Time
Processing Time? Lots of late data won’t be counted
Event Time
Processing
Time
Beam’s Solution - Watermarks!
Event Time
Processing
Time
Watermarks
● Beam’s notion of when data is
complete
● When a watermark passes the end of
a window, additional data is late
● Beam has several built in watermark
estimators
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Example: Timestamp observing estimation
Event Time
Processing
Time
Late Data*
Watermarks
● Handled at the source I/O level
● Most pipelines don’t need to
implement estimation, but do need to
be aware of it
Recall Tradeoffs
Completeness Low Latency Low Cost
`
Important
Not Important
Triggers
● Beam’s mechanism for controlling
tradeoffs
● Describe when to emit aggregated
results of a single window
● Allow emitting early results or results
including late data
Types of Triggers
● Event Time Triggers
● Processing Time Triggers
● Data-Driven Triggers
● Composite Triggers
Set on windows
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=AfterProcessingTime(1 * 60),
accumulation_mode=AccumulationMode.DISCARDING)
Example Triggers
● AfterProcessingTime(delay=1 * 60)
● AfterCount(1)
● AfterWatermark(
early=AfterProcessingTime(delay=1 * 60),
late=AfterCount(1))
● AfterAny(AfterCount(1),
AfterProcessingTime(delay=1 * 60))
Accumulation Mode
● Describes how to handle data that
has already been emitted
● 2 types: Accumulating and
Discarding
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.DISCARDING)
[5, 8, 3, 1, 2, 6, 9, 7]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.DISCARDING)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.DISCARDING)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
[1, 2, 6]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.DISCARDING)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
[1, 2, 6]
[9, 7]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.Accumulating)
[5, 8, 3, 1, 2, 6, 9, 7]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.Accumulating)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.Accumulating)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
[5, 8, 3, 1, 2, 6]
Discarding Accumulation Mode
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=Repeating(AfterCount(3)),
accumulation_mode=AccumulationMode.Accumulating)
[5, 8, 3, 1, 2, 6, 9, 7] -> [5, 8, 3]
[5, 8, 3, 1, 2, 6]
[5, 8, 3, 1, 2, 6, 9, 7]
More!
● Pipeline State
● Timers
● Runner initiated splits
● Self checkpointing
● Bundle finalization
Demo
https://github.com/damccorm/ato-demo-2022
Come join our community!
Questions?
Slides - shorturl.at/GNU07

More Related Content

What's hot (20)

PDF
Kafka At Scale in the Cloud
confluent
 
PDF
Google Cloud Dataflow
Alex Van Boxel
 
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
HostedbyConfluent
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
PDF
Lessons learned from writing over 300,000 lines of infrastructure code
Yevgeniy Brikman
 
PDF
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
PPT
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
PDF
JavaScript Fetch API
Xcat Liu
 
PDF
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
PDF
Intro to Asynchronous Javascript
Garrett Welson
 
PPTX
ReactJS presentation.pptx
DivyanshGupta922023
 
PDF
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
PPTX
React js
Oswald Campesato
 
PDF
The New JavaScript: ES6
Rob Eisenberg
 
PPTX
Power of SPL - Search Processing Language
Splunk
 
PPTX
Introduction to Redis
Arnab Mitra
 
Kafka At Scale in the Cloud
confluent
 
Google Cloud Dataflow
Alex Van Boxel
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
HostedbyConfluent
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Lessons learned from writing over 300,000 lines of infrastructure code
Yevgeniy Brikman
 
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
JavaScript Fetch API
Xcat Liu
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
Intro to Asynchronous Javascript
Garrett Welson
 
ReactJS presentation.pptx
DivyanshGupta922023
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
The New JavaScript: ES6
Rob Eisenberg
 
Power of SPL - Search Processing Language
Splunk
 
Introduction to Redis
Arnab Mitra
 

Similar to Streaming Data Pipelines With Apache Beam (20)

PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
PDF
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
PDF
Introduction to Apache Beam
marcgonzalez.eu
 
PDF
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
C4Media
 
PDF
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
PDF
Data Stream Processing - Concepts and Frameworks
Matthias Niehoff
 
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
PPTX
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
PDF
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PDF
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward
 
PDF
Nexmark with beam
Etienne Chauchot
 
PPTX
Flink. Pure Streaming
Indizen Technologies
 
PDF
Streaming Analytics for Financial Enterprises
Databricks
 
PDF
Big Data Warsaw
Maximilian Michels
 
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
PPTX
Gcp dataflow
Igor Roiter
 
PDF
Log Event Stream Processing In Flink Way
George T. C. Lai
 
PDF
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Ververica
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
Introduction to Apache Beam
marcgonzalez.eu
 
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
C4Media
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Data Stream Processing - Concepts and Frameworks
Matthias Niehoff
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Dimos Raptis
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward
 
Nexmark with beam
Etienne Chauchot
 
Flink. Pure Streaming
Indizen Technologies
 
Streaming Analytics for Financial Enterprises
Databricks
 
Big Data Warsaw
Maximilian Michels
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
Gcp dataflow
Igor Roiter
 
Log Event Stream Processing In Flink Way
George T. C. Lai
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Ververica
 
Ad

More from All Things Open (20)

PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
PPTX
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
All Things Open
 
PDF
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
PDF
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
All Things Open
 
PDF
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
PDF
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
All Things Open
 
PDF
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
PPTX
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
All Things Open
 
PDF
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
All Things Open
 
PDF
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
All Things Open
 
PPTX
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
All Things Open
 
PDF
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
All Things Open
 
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
All Things Open
 
PDF
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
PDF
Making Operating System updates fast, easy, and safe
All Things Open
 
PDF
Reshaping the landscape of belonging to transform community
All Things Open
 
PDF
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
All Things Open
 
PDF
Integrating Diversity, Equity, and Inclusion into Product Design
All Things Open
 
PDF
The Open Source Ecosystem for eBPF in Kubernetes
All Things Open
 
PDF
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
All Things Open
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
All Things Open
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
All Things Open
 
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
All Things Open
 
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
All Things Open
 
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
All Things Open
 
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
All Things Open
 
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
All Things Open
 
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
All Things Open
 
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
All Things Open
 
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
Making Operating System updates fast, easy, and safe
All Things Open
 
Reshaping the landscape of belonging to transform community
All Things Open
 
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
All Things Open
 
Integrating Diversity, Equity, and Inclusion into Product Design
All Things Open
 
The Open Source Ecosystem for eBPF in Kubernetes
All Things Open
 
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
All Things Open
 
Ad

Recently uploaded (20)

PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
July Patch Tuesday
Ivanti
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
July Patch Tuesday
Ivanti
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 

Streaming Data Pipelines With Apache Beam

Editor's Notes

  • #5: My path: Studied at Vanderbilt, got Bachelors + Masters Joined Microsoft + worked on Azure DevOps - first started to fall in love with OSS here. Particularly shaped by experiences w/ big OSS repos (GulpJs, Prettier) - maintainers matter! Got to work on GitHub Actions, helped v2 GA, authored most first party actions (setup-node, toolkit) Joined Google to work on Apache Beam and Google’s execution engine, Dataflow. Currently Apache committer and the technical lead of Google’s Beam and Dataflow Machine Learning team. Neat to be part of a bigger community driven project, where decisions are made on the distribution list, not in company meetings. Full circle; I hope to be like those initial OSS maintainers who welcomed me into open source.
  • #24: Slide adapted from https://docs.google.com/presentation/d/1SHie3nwe-pqmjGum_QDznPr-B_zXCjJ2VBDGdafZme8/edit#slide=id.g12846a6162_0_2098
  • #25: Slide adapted from https://docs.google.com/presentation/d/1SHie3nwe-pqmjGum_QDznPr-B_zXCjJ2VBDGdafZme8/edit#slide=id.g12846a6162_0_2098
  • #26: Slide adapted from https://docs.google.com/presentation/d/1SHie3nwe-pqmjGum_QDznPr-B_zXCjJ2VBDGdafZme8/edit#slide=id.g12846a6162_0_2098
  • #32: Usually not used in streaming scenarios, unless you’re using specific triggering setups
  • #33: Call out its easy to change your aggregation strategy
  • #38: Lots of data will be considered late
  • #55: Slide adapted from https://docs.google.com/presentation/d/1SHie3nwe-pqmjGum_QDznPr-B_zXCjJ2VBDGdafZme8/edit#slide=id.g12846a6162_0_2098
  • #56: Set when you window
  • #57: Examples: Event time - afterwatermark Processing time - AfterProcessingTime (early firing) AfterCount
  • #71: Highlight areas of growth (ML, x-lang, performance, new SDKs)