SlideShare a Scribd company logo
Back-Pressure in
Action
Intro & Agenda Crawler Intro & Problem
Statements
Crawler Architecture
Infrastructure: Akka Streams,
Kafka, etc.
The Goodies
Crawl
Jobs
Job DB
Validate
URL
Cache
Downloa
d
Process
URLs
URLs
Timestamps
High-Level View
Requirements Ever-expanding # of URLs
Can’t crawl all URLs at once
Control over concurrent web GETs
Efficient resource usage
Resilient under high burst
Scales horizontally & vertically
Sizing the Crawl Job
Let:
i = Number of crawl URLs in a job
n = Average number of links per page
d = The crawl depth
(how many layers to follow links)
u = The max number of URLs to process
Then:
u = ind
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
0 2 4 6 8 10 12
totalURLs vs depth
depth (initialURLs = 1, outLinks = 5)
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
totalURLs vs initialURLs
initialURLs (depth = 5, outLinks = 5)
The Reactive Manifesto
Responsive
Message Driven
Elastic Resilient
Why Does it Matter?
Respond in a deterministic, timely manner
Stays responsive in the face of failure – even cascading failures
Stays responsive under workload spikes
Basic building block for responsive, resilient, and elastic systems
Responsive
Resilient
Elastic
Message Driven
The Right Ingredients
• Kafka
• Huge persistent buffer for the bursts
• Load distribution to very large number of
processing nodes
• Enable horizontal scalability
• Akka streams
• High performance, highly efficient processing
pipeline
• Resilient with end-to-end back-pressure
• Fully asynchronous – utilizes mapAsyncUnordered
with Async HTTP client
• Async HTTP client
• Non-blocking and consumes no threads in waiting
• Integrates with Akka Streams for a high
parallelism, low resource solution
Efficient
Resilient
Scale
Akka
Stream
Async
HTTP
Reactive
Kafka
Crawl
Jobs
Job DB
Validate
URL
Cache
Downloa
d
Process
URLs
URLs
Timestamps
Adding Kafka & Akka Streams
URLs
Akka Streams
Akka Streams,
what???
High performance, pure async,
stream processing
Conforms to reactive streams
Simple, yet powerful GraphDSL
allows clear stream topology
declaration
Central point to understand
processing pipeline
Crawl Stream
Actual Stream Declaration in Code
prioritizeSource ~> crawlerFlow ~> bCast0 ~> result ~> bCast ~> outLinksFlow ~> outLinksSink
bCast ~> dataSinkFlow ~> kafkaDataSink
bCast ~> hdfsDataSink
bCast ~> graphFlow ~> merge ~> graphSink
bCast0 ~> maxPage ~> merge
bCast0 ~> retry ~> bCastRetry ~> retryFailed ~> merge
bCastRetry ~> errorSink
Prioritized
Source
Crawl
Result
MaxPageReached
Retry
OutLinks
Data
Graph
CheckFail
CheckErr
OutLinks
Sink
Kafka Data
Sink
HDFS Data
Sink
Graph
Sink
Error
Sink
Resulting Characteristics
Efficient
• Low thread count, controlled by Akka and pure non-blocking async HTTP
• High latency URLs do not block low latency URLs using MapAsyncUnordered
• Well-controlled download concurrency using MapAsyncUnordered
• Thread per concurrent crawl job
Resilient
• Processes only what can be processed – no resource overload
• Kafka as short-term, persistent queue
Scale
• Kafka feeds next batch of URLs to available node cluster
• Pull model – only processes that have capacity will get the load
• Kafka distributes work to large number of processing nodes in cluster
Back-Pressure
0
20000
40000
60000
80000
100000
120000
0 100 200 300 400 500 600 700
Queue Size
Time (seconds)
0
200
400
URLs/sec
Time (seconds)
initialURLs : 100
parallelism : 1000
processTime : 1 – 5
s
outLinks : 0 - 10
depth : 5
totalCrawled :
312500
Challenges
Training
• Developers not used to E2E stream
definitions
• More familiar with deeply nested function
calls
Maturity of Infrastructure
• Kafka 0.9 use fetch as heartbeat
• Slow nodes cause timeout & rebalance
• Solved in 0.10
What it would
have been…
Bloated, ineffective concurrency
control
Lack of well-thought-out and visible
processing pipeline
Clumsy code, hard to manage &
understand
Low training cost, high project TCO
Dev / Support / Maintenance
Bottom Line
Standardized Reactive Platform
Efficiency & Resilience meets Standardization
• Monitoring
• Need to collect metrics, consistently
• Logging
• Correlation across services
• Uniformity in logs
• Security
• Need to apply standard security configuration
• Environment Resolution
• Staging, production, etc.
Consistency in the face of Heterogeneity
squbs is not… A framework by its own
A programming model – use Akka
Take all or none –
Components/patterns can mostly be
used independently
squbs
Akka for large
scale deployments
Bootstrap
Lifecycle management
Loosely-coupled module system
Integration hooks for logging,
monitoring, ops integration
squbs
Akka for large
scale deployments
JSON console
HttpClient with pluggable resolver and
monitoring/logging hooks
Test tools and interfaces
Goodies:
- Activators for Scala & Java
- Programming patterns and helpers for
Akka and Akka Stream Use cases…,
and growing
PerpetualStream
• Provides a convenience trait to help
write streams controlled by system
lifecycle
• Minimal/no message losses
• Register PerpetualStream to make
stream start/stop
• Provides customization hooks –
especially for how to stop the stream
• Provides killSwitch (from Akka) to be
embedded into stream
• Implementers - just provide your
stream!
A non-stop stream; starts and stops with the system
class MyStream extends PerpetualStream[Future[Int]] {
def generator = Iterator.iterate(0) { p =>
if (p == Int.MaxValue) 0 else p + 1
}
val source = Source.fromIterator(generator _)
val ignoreSink = Sink.ignore[Int]
override def streamGraph = RunnableGraph.fromGraph(
GraphDSL.create(ignoreSink) { implicit builder =>
sink =>
import GraphDSL.Implicits._
source ~> killSwitch.flow[Int] ~> sink
ClosedShape
})
}
PersistentBuffer/BroadcastBuffer
• Data & indexes in rotating memory-mapped files
• Off-heap rotating file buffer – very large buffers
• Restarts gracefully with no or minimal message loss
• Not as durable as a remote data store, but much faster
• Does not back-pressure upstream beyond data/index writes
• Similar usage to Buffer and Broadcast
• BroadcastBuffer – a FanOutShape decouples each output port making each downstream
independent
• Useful if downstream stage blocked or unavailable
• Kafka is unavailable/rebalancing but system cannot backpressure/deny incoming
traffic
• Optional commit stage for at-least-once delivery semantics
• Implementation based on Chronicle Queue
A buffer of virtually unlimited size
Summary
• Kafka + Akka Streams + Async I/O = Ideal Architecture for High Bursts
& High Efficiency
• Akka Streams
• Clear view of stream topology
• Back-pressure & Kafka allows buffering load bursts
• Standardization
• Walk like a duck, quack like a duck, and manage it like a duck
• squbs: Have the cake, and eat it too, with goodies like
• PerpetualStream
• PersistentBuffer
• BroadcastBuffer
Q&A – Feedback Appreciated
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka @PayPal

More Related Content

What's hot (20)

PDF
Akka streams scala italy2015
mircodotta
 
PDF
A dive into akka streams: from the basics to a real-world scenario
Gioia Ballin
 
PDF
VJUG24 - Reactive Integrations with Akka Streams
Johan Andrén
 
PDF
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
PDF
Reactive Streams / Akka Streams - GeeCON Prague 2014
Konrad Malawski
 
PDF
2014 akka-streams-tokyo-japanese
Konrad Malawski
 
PDF
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
PPTX
How to manage large amounts of data with akka streams
Igor Mielientiev
 
PDF
Asynchronous stream processing with Akka Streams
Johan Andrén
 
PDF
Scala usergroup stockholm - reactive integrations with akka streams
Johan Andrén
 
PDF
Streaming all the things with akka streams
Johan Andrén
 
PDF
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
Konrad Malawski
 
PPTX
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
ODP
Akka streams
Knoldus Inc.
 
PDF
Reactive Streams 1.0 and Akka Streams
Dean Wampler
 
PDF
Practical Akka HTTP - introduction
Łukasz Sowa
 
PDF
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
PDF
Gearpump akka streams
Kam Kasravi
 
PDF
Developing Secure Scala Applications With Fortify For Scala
Lightbend
 
PDF
Reactive programming on Android
Tomáš Kypta
 
Akka streams scala italy2015
mircodotta
 
A dive into akka streams: from the basics to a real-world scenario
Gioia Ballin
 
VJUG24 - Reactive Integrations with Akka Streams
Johan Andrén
 
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Konrad Malawski
 
2014 akka-streams-tokyo-japanese
Konrad Malawski
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
How to manage large amounts of data with akka streams
Igor Mielientiev
 
Asynchronous stream processing with Akka Streams
Johan Andrén
 
Scala usergroup stockholm - reactive integrations with akka streams
Johan Andrén
 
Streaming all the things with akka streams
Johan Andrén
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
Konrad Malawski
 
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Akka streams
Knoldus Inc.
 
Reactive Streams 1.0 and Akka Streams
Dean Wampler
 
Practical Akka HTTP - introduction
Łukasz Sowa
 
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
Gearpump akka streams
Kam Kasravi
 
Developing Secure Scala Applications With Fortify For Scala
Lightbend
 
Reactive programming on Android
Tomáš Kypta
 

Similar to Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka @PayPal (20)

PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
PDF
Akka stream and Akka CQRS
Milan Das
 
PDF
Solution for events logging with akka streams and kafka
Anatoly Sementsov
 
PDF
Reactive Summit 2017 Highlights!
Fabio Tiriticco
 
PDF
Reactive Streams, j.u.concurrent & Beyond!
Konrad Malawski
 
PDF
Reactive stream processing using Akka streams
Johan Andrén
 
PDF
Reactive integrations with Akka Streams
Konrad Malawski
 
PPTX
Taking Akka Streams & Akka Http to Large Scale Production Applications
Akara Sucharitakul
 
PDF
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
PPTX
Stream processing from single node to a cluster
Gal Marder
 
ODP
Introduction to Akka Streams [Part-I]
Knoldus Inc.
 
PDF
Mirco Dotta - Akka Streams
Scala Italy
 
PPTX
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lightbend
 
PDF
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Lightbend
 
PPTX
Project Deimos
Simon Suo
 
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
PDF
Streaming with Kafka Akka Spark
Sébastien Diaz
 
PPTX
Software architecture for data applications
Ding Li
 
PDF
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
Stephane Manciot
 
PDF
Reactive streams processing using Akka Streams
Johan Andrén
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Akka stream and Akka CQRS
Milan Das
 
Solution for events logging with akka streams and kafka
Anatoly Sementsov
 
Reactive Summit 2017 Highlights!
Fabio Tiriticco
 
Reactive Streams, j.u.concurrent & Beyond!
Konrad Malawski
 
Reactive stream processing using Akka streams
Johan Andrén
 
Reactive integrations with Akka Streams
Konrad Malawski
 
Taking Akka Streams & Akka Http to Large Scale Production Applications
Akara Sucharitakul
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
Stream processing from single node to a cluster
Gal Marder
 
Introduction to Akka Streams [Part-I]
Knoldus Inc.
 
Mirco Dotta - Akka Streams
Scala Italy
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lightbend
 
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Lightbend
 
Project Deimos
Simon Suo
 
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Streaming with Kafka Akka Spark
Sébastien Diaz
 
Software architecture for data applications
Ding Li
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
Stephane Manciot
 
Reactive streams processing using Akka Streams
Johan Andrén
 
Ad

More from Reactivesummit (6)

PPTX
Distributed stream processing with Apache Kafka
Reactivesummit
 
PDF
Reactive Polyglot Microservices with OpenShift and Vert.x
Reactivesummit
 
PDF
Microservices: The danger of overhype and importance of checklists
Reactivesummit
 
PDF
Orchestrated Chaos: Applying Failure Testing Research at Scale.
Reactivesummit
 
PDF
The Zen Of Erlang
Reactivesummit
 
PDF
Monolith to Reactive Microservices
Reactivesummit
 
Distributed stream processing with Apache Kafka
Reactivesummit
 
Reactive Polyglot Microservices with OpenShift and Vert.x
Reactivesummit
 
Microservices: The danger of overhype and importance of checklists
Reactivesummit
 
Orchestrated Chaos: Applying Failure Testing Research at Scale.
Reactivesummit
 
The Zen Of Erlang
Reactivesummit
 
Monolith to Reactive Microservices
Reactivesummit
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 

Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka @PayPal

  • 2. Intro & Agenda Crawler Intro & Problem Statements Crawler Architecture Infrastructure: Akka Streams, Kafka, etc. The Goodies
  • 4. Requirements Ever-expanding # of URLs Can’t crawl all URLs at once Control over concurrent web GETs Efficient resource usage Resilient under high burst Scales horizontally & vertically
  • 5. Sizing the Crawl Job Let: i = Number of crawl URLs in a job n = Average number of links per page d = The crawl depth (how many layers to follow links) u = The max number of URLs to process Then: u = ind 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 0 2 4 6 8 10 12 totalURLs vs depth depth (initialURLs = 1, outLinks = 5) 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 totalURLs vs initialURLs initialURLs (depth = 5, outLinks = 5)
  • 6. The Reactive Manifesto Responsive Message Driven Elastic Resilient
  • 7. Why Does it Matter? Respond in a deterministic, timely manner Stays responsive in the face of failure – even cascading failures Stays responsive under workload spikes Basic building block for responsive, resilient, and elastic systems Responsive Resilient Elastic Message Driven
  • 8. The Right Ingredients • Kafka • Huge persistent buffer for the bursts • Load distribution to very large number of processing nodes • Enable horizontal scalability • Akka streams • High performance, highly efficient processing pipeline • Resilient with end-to-end back-pressure • Fully asynchronous – utilizes mapAsyncUnordered with Async HTTP client • Async HTTP client • Non-blocking and consumes no threads in waiting • Integrates with Akka Streams for a high parallelism, low resource solution Efficient Resilient Scale Akka Stream Async HTTP Reactive Kafka
  • 10. Akka Streams, what??? High performance, pure async, stream processing Conforms to reactive streams Simple, yet powerful GraphDSL allows clear stream topology declaration Central point to understand processing pipeline
  • 11. Crawl Stream Actual Stream Declaration in Code prioritizeSource ~> crawlerFlow ~> bCast0 ~> result ~> bCast ~> outLinksFlow ~> outLinksSink bCast ~> dataSinkFlow ~> kafkaDataSink bCast ~> hdfsDataSink bCast ~> graphFlow ~> merge ~> graphSink bCast0 ~> maxPage ~> merge bCast0 ~> retry ~> bCastRetry ~> retryFailed ~> merge bCastRetry ~> errorSink Prioritized Source Crawl Result MaxPageReached Retry OutLinks Data Graph CheckFail CheckErr OutLinks Sink Kafka Data Sink HDFS Data Sink Graph Sink Error Sink
  • 12. Resulting Characteristics Efficient • Low thread count, controlled by Akka and pure non-blocking async HTTP • High latency URLs do not block low latency URLs using MapAsyncUnordered • Well-controlled download concurrency using MapAsyncUnordered • Thread per concurrent crawl job Resilient • Processes only what can be processed – no resource overload • Kafka as short-term, persistent queue Scale • Kafka feeds next batch of URLs to available node cluster • Pull model – only processes that have capacity will get the load • Kafka distributes work to large number of processing nodes in cluster
  • 13. Back-Pressure 0 20000 40000 60000 80000 100000 120000 0 100 200 300 400 500 600 700 Queue Size Time (seconds) 0 200 400 URLs/sec Time (seconds) initialURLs : 100 parallelism : 1000 processTime : 1 – 5 s outLinks : 0 - 10 depth : 5 totalCrawled : 312500
  • 14. Challenges Training • Developers not used to E2E stream definitions • More familiar with deeply nested function calls Maturity of Infrastructure • Kafka 0.9 use fetch as heartbeat • Slow nodes cause timeout & rebalance • Solved in 0.10
  • 15. What it would have been… Bloated, ineffective concurrency control Lack of well-thought-out and visible processing pipeline Clumsy code, hard to manage & understand Low training cost, high project TCO Dev / Support / Maintenance
  • 18. Efficiency & Resilience meets Standardization • Monitoring • Need to collect metrics, consistently • Logging • Correlation across services • Uniformity in logs • Security • Need to apply standard security configuration • Environment Resolution • Staging, production, etc. Consistency in the face of Heterogeneity
  • 19. squbs is not… A framework by its own A programming model – use Akka Take all or none – Components/patterns can mostly be used independently
  • 20. squbs Akka for large scale deployments Bootstrap Lifecycle management Loosely-coupled module system Integration hooks for logging, monitoring, ops integration
  • 21. squbs Akka for large scale deployments JSON console HttpClient with pluggable resolver and monitoring/logging hooks Test tools and interfaces Goodies: - Activators for Scala & Java - Programming patterns and helpers for Akka and Akka Stream Use cases…, and growing
  • 22. PerpetualStream • Provides a convenience trait to help write streams controlled by system lifecycle • Minimal/no message losses • Register PerpetualStream to make stream start/stop • Provides customization hooks – especially for how to stop the stream • Provides killSwitch (from Akka) to be embedded into stream • Implementers - just provide your stream! A non-stop stream; starts and stops with the system class MyStream extends PerpetualStream[Future[Int]] { def generator = Iterator.iterate(0) { p => if (p == Int.MaxValue) 0 else p + 1 } val source = Source.fromIterator(generator _) val ignoreSink = Sink.ignore[Int] override def streamGraph = RunnableGraph.fromGraph( GraphDSL.create(ignoreSink) { implicit builder => sink => import GraphDSL.Implicits._ source ~> killSwitch.flow[Int] ~> sink ClosedShape }) }
  • 23. PersistentBuffer/BroadcastBuffer • Data & indexes in rotating memory-mapped files • Off-heap rotating file buffer – very large buffers • Restarts gracefully with no or minimal message loss • Not as durable as a remote data store, but much faster • Does not back-pressure upstream beyond data/index writes • Similar usage to Buffer and Broadcast • BroadcastBuffer – a FanOutShape decouples each output port making each downstream independent • Useful if downstream stage blocked or unavailable • Kafka is unavailable/rebalancing but system cannot backpressure/deny incoming traffic • Optional commit stage for at-least-once delivery semantics • Implementation based on Chronicle Queue A buffer of virtually unlimited size
  • 24. Summary • Kafka + Akka Streams + Async I/O = Ideal Architecture for High Bursts & High Efficiency • Akka Streams • Clear view of stream topology • Back-pressure & Kafka allows buffering load bursts • Standardization • Walk like a duck, quack like a duck, and manage it like a duck • squbs: Have the cake, and eat it too, with goodies like • PerpetualStream • PersistentBuffer • BroadcastBuffer
  • 25. Q&A – Feedback Appreciated