SlideShare a Scribd company logo
Dean Wampler, Ph.D.
dean@lightbend.com
@deanwampler
Streaming Microservices
With Akka Streams and Kafka Streams
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STREAMS, AND KAFKA
New second edition!
lbnd.io/fast-data-book
Free as in🍺
I	lead	the	Lightbend	Fast	Data	Pla2orm	project;	streaming	data	and	microservices
lightbend.com/fast-data-platform
Streaming architectures (from the report)
Kubernetes, Mesos, YARN, …
Cloud or on-premise
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 10
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
ReacBve	PlaEorm
Go Node.js …
Kubernetes, Mesos, YARN, …
Cloud or on-premise
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
3 10
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
ReacBve	PlaEorm
Go Node.js …
Today’s focus:
•Kafka - the
data backplane
•Akka Streams
and Kafka
Streams -
streaming
microservices
Why Kafka?
Files
Sockets
REST
ZooKeeper Cluster
ZK
Ka5
Ak
S3,	…
HDFS
DiskDiskDisk
1
5
6
3 10
KaFa Cluster
Broker
2
4
7
8
Events
Streams
Microservices
ReacBve	PlaEorm
Go Node.js …
Why Kafka?
Organized into
topics
Ka#a
Partition 1
Partition 2
Topic A
Partition 1Topic B
Topics are partitioned,
replicated, and
distributed
Unlike queues, consumers
don’t delete entries; Kafka
manages their lifecycles
M Producers
N Consumers,
who start
reading where
they want
Consumer
1
(at offset 14)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Partition 1Topic B
Producer
1 Producer
2
Consumer
2
(at offset 10)
writes
reads
Consumer
3
(at offset 6)
earliest latest
Logs,	not	queues!
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N * M links ConsumersProducers
Before:
Kafka for Connectivity
X
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N * M links ConsumersProducers
Before:
Kafka for Connectivity
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N + M links ConsumersProducers
After:
X X
Kafka for Connectivity
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N + M links ConsumersProducers
After:
• Simplify dependencies
• Resilient against data loss
• M producers, N consumers
• Simplicity of one “API” for
communication
Streaming
Engines
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
Go Node.js …
Spark, Flink - services to which
you submit work. Large scale,
automatic data partitioning.
Beam - similar. Google’s project
that has been instrumental in
defining streaming semantics.
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
Go Node.js …
They do a lot (Spark example)
…NodeNode
Spark Driver
object MyApp {
def main() {
val ss =
new SparkSession(…)
…
}
}
Cluster
Manager
Spark Executor
task task
task task
Spark Executor
task task
task task
…
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
Go Node.js …
They do a lot (Spark example)
Cluster
Node
input
Node Node Node
Time
filter
flatMap
join
map
Partition 1 Partition 2 Partition 3 Partition 4
Partition 1 Partition 2 Partition 3 Partition 4
Partition 1 Partition 2 Partition 3 Partition 4
Partition 1 Partition 2 Partition 3 Partition 4
Partition 1 Partition 2 Partition 3 Partition 4
…
stage1stage2
… … … …
Files
Sockets
REST
ZooKeeper Cluster
ZK
Mini-batch
Spark
Batch
Spark
…
Low Latency
Flink
Ka5a	Streams
Akka	Streams
Beam
Persistence
S3,	…
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Broker
2
4
7
8
9
Beam
Spark	
Events
Streams
Storage
Microservices
Go Node.js …
Akka Streams, Kafka Streams -
libraries for “data-centric
microservices”. Smaller scale,
but great flexibility
Streaming
Engines
“Record-centric” μ-services
Events Records
A Spectrum of Microservices
Event-driven μ-services
…
Browse
REST
AccountOrders
Shopping
Cart
API	Gateway
Inventory
storage
Data
Model
Training
Model
Serving
Other
Logic
← Data Spectrum →
A Spectrum of Microservices
…
Browse
REST
AccountOrders
Shopping
Cart
API	Gateway
Inventory
• Each datum has an identity
• Process each one uniquely
• Think sessions and state
machines
“Record-centric” μ-services
Events Records
A Spectrum of Microservices
Event-driven μ-services
…
Browse
REST
AccountOrders
Shopping
Cart
API	Gateway
Inventory
storage
Data
Model
Training
Model
Serving
Other
Logic
← Data Spectrum →
A Spectrum of Microservices
storage
Data
Model
Training
Model
Serving
Other
Logic
• “Anonymous” records
• Process en masse
• Think SQL queries for
analytics
“Record-centric” μ-services
Events Records
A Spectrum of Microservices
Event-driven μ-services
…
Browse
REST
AccountOrders
Shopping
Cart
API	Gateway
Inventory
storage
Data
Model
Training
Model
Serving
Other
Logic
← Data Spectrum →
Events Records
A Spectrum of Microservices
Event-driven μ-services
…
Browse
REST
AccountOrders
Shopping
Cart
API	Gateway
Inventory
Akka emerged from the left-hand
side of the spectrum, the world
of highly Reactive microservices.
Akka Streams pushes to the
right, more data-centric.
https://www.reactivemanifesto.org/
← Data Spectrum →
“Record-centric” μ-services
Events Records
A Spectrum of Microservices
storage
Data
Model
Training
Model
Serving
Other
Logic
Emerged from the right-hand
side.
Kafka Streams pushes to the
left, supporting many event-
processing scenarios.
← Data Spectrum →
Kafka Streams
• Important stream-processing semantics, e.g.,
• Windowing support (e.g., group by within a window)
Kafka Streams
0
Time (minutes)
1 2 3 …
Analysis
Server 1
Server 2
accumulate
1 1
2 2 2 2 2 2
1 1
2 2
1 1 1
…
Key
Collect data,
Then process
accumulate
n
Event at Server n
propagated to
Analysis
See	my	O’Reilly	
report	for	details.
• Important stream-processing semantics, e.g.,
• Distinguish between event time and processing time
Kafka Streams
0
Time (minutes)
1 2 3 …
Analysis
Server 1
Server 2
accumulate
1 1
2 2 2 2 2 2
1 1
2 2
1 1 1
…
Key
Collect data,
Then process
accumulate
n
Event at Server n
propagated to
Analysis
• Java API
• Scala API: written by Lightbend
• SQL!!
Kafka Streams
Kafka Streams Example
Data
Model
Training
Model
Serving
Other
Logic
Raw
Data
Model
Params
Scored
Records
Final
Records
storage
Data
Model
Training
Model
Serving
Other
Logic
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Scored
Records
val builder = new StreamsBuilderS // New Scala Wrapper API.
val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic)
val model = builder.stream[Array[Byte], Array[Byte]](modelTopic)
val modelProcessor = new ModelProcessor
val scorer = new Scorer(modelProcessor) // scorer.score(record) used
model.mapValues(bytes => Model.parseBytes(bytes)) // array => record
   .filter((key, model) => model.valid) // Successful?
.mapValues(model => ModelImpl.findModel(model))
   .process(() => modelProcessor, …) // Set up actual model
data.mapValues(bytes => DataRecord.parseBytes(bytes))
   .filter((key, record) => record.valid)
.mapValues(record => new ScoredRecord(scorer.score(record),record))
   .to(scoredRecordsTopic)
val streams = new KafkaStreams(
builder.build, streamsConfiguration)
streams.start()
sys.addShutdownHook(streams.close())
val builder = new StreamsBuilderS // New Scala Wrapper API.
val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic)
val model = builder.stream[Array[Byte], Array[Byte]](modelTopic)
val modelProcessor = new ModelProcessor
val scorer = new Scorer(modelProcessor) // scorer.score(record) used
model.mapValues(bytes => Model.parseBytes(bytes)) // array => record
   .filter((key, model) => model.valid) // Successful?
.mapValues(model => ModelImpl.findModel(model))
   .process(() => modelProcessor, …) // Set up actual model
data.mapValues(bytes => DataRecord.parseBytes(bytes))
   .filter((key, record) => record.valid)
.mapValues(record => new ScoredRecord(scorer.score(record),record))
   .to(scoredRecordsTopic)
val streams = new KafkaStreams(
builder.build, streamsConfiguration)
streams.start()
sys.addShutdownHook(streams.close())
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Scored
Records
val builder = new StreamsBuilderS // New Scala Wrapper API.
val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic)
val model = builder.stream[Array[Byte], Array[Byte]](modelTopic)
val modelProcessor = new ModelProcessor
val scorer = new Scorer(modelProcessor) // scorer.score(record) used
model.mapValues(bytes => Model.parseBytes(bytes)) // array => record
   .filter((key, model) => model.valid) // Successful?
.mapValues(model => ModelImpl.findModel(model))
   .process(() => modelProcessor, …) // Set up actual model
data.mapValues(bytes => DataRecord.parseBytes(bytes))
   .filter((key, record) => record.valid)
.mapValues(record => new ScoredRecord(scorer.score(record),record))
   .to(scoredRecordsTopic)
val streams = new KafkaStreams(
builder.build, streamsConfiguration)
streams.start()
sys.addShutdownHook(streams.close())
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Scored
Records
val builder = new StreamsBuilderS // New Scala Wrapper API.
val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic)
val model = builder.stream[Array[Byte], Array[Byte]](modelTopic)
val modelProcessor = new ModelProcessor
val scorer = new Scorer(modelProcessor) // scorer.score(record) used
model.mapValues(bytes => Model.parseBytes(bytes)) // array => record
   .filter((key, model) => model.valid) // Successful?
.mapValues(model => ModelImpl.findModel(model))
   .process(() => modelProcessor, …) // Set up actual model
data.mapValues(bytes => DataRecord.parseBytes(bytes))
   .filter((key, record) => record.valid)
.mapValues(record => new ScoredRecord(scorer.score(record),record))
   .to(scoredRecordsTopic)
val streams = new KafkaStreams(
builder.build, streamsConfiguration)
streams.start()
sys.addShutdownHook(streams.close())
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Scored
Records
What’s Missing?
The rest of the microservice tools
you need.
Embed your Kafka Streams code
in microservices written with
Akka…
Akka Streams
Akka Streams
Event
Event
Event
Event
Event
Event
Event/Data
Stream
Consumer
Consumer
bounded queue
back
pressure
back
pressure
back
pressure
• Back pressure for flow control
• http://www.reactive-streams.org/
Akka Streams
Event/Data
Stream
Consumer
Consumer
Back pressure composes!
Akka Streams
• Part of the Akka ecosystem
• Akka Actors, Akka Cluster, Akka HTTP, Akka
Persistence, …
• Alpakka - rich connection library
• Optimized for low overhead and latency
Akka Streams
• The “gist” - calculate factorials:
val source: Source[Int, NotUsed] = Source(1 to 10)
val factorials = source.scan(BigInt(1)) (
(total, next) => total * next )
factorials.runWith(Sink.foreach(println))
1
2
6
24
120
720
5040
40320
362880
3628800
val source: Source[Int, NotUsed] = Source(1 to 10)
val factorials = source.scan(BigInt(1)) (
(total, next) => total * next )
factorials.runWith(Sink.foreach(println))
Akka Streams
• The “gist” - calculate factorials:
Source Flow Sink
A “Graph”
1
2
6
24
120
720
5040
40320
362880
3628800
Akka Cluster
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Alpakka
implicit val system = ActorSystem("ModelServing")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val modelProcessor = new ModelProcessor // Same as KS example
val scorer = new Scorer(modelProcessor) // Same as KS example
val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage”
val dataStream: Source[Record, Consumer.Control] =
Consumer.atMostOnceSource(dataConsumerSettings,
Subscriptions.topics(rawDataTopic))
.map(input => DataRecord.parseBytes(input.value()))
.collect{ case Success(data) => data }
val modelStream: Source[ModelImpl, Consumer.Control] =
Consumer.atMostOnceSource(modelConsumerSettings,
Subscriptions.topics(modelTopic))
.map(input => Model.parseBytes(input.value()))
.collect{ case Success(mod) => mod }
.map(model => ModelImpl.findModel(model))
Akka Cluster
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Alpakka
implicit val system = ActorSystem("ModelServing")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val modelProcessor = new ModelProcessor // Same as KS example
val scorer = new Scorer(modelProcessor) // Same as KS example
val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage”
val dataStream: Source[Record, Consumer.Control] =
Consumer.atMostOnceSource(dataConsumerSettings,
Subscriptions.topics(rawDataTopic))
.map(input => DataRecord.parseBytes(input.value()))
.collect{ case Success(data) => data }
val modelStream: Source[ModelImpl, Consumer.Control] =
Consumer.atMostOnceSource(modelConsumerSettings,
Subscriptions.topics(modelTopic))
.map(input => Model.parseBytes(input.value()))
.collect{ case Success(mod) => mod }
.map(model => ModelImpl.findModel(model))
Akka Cluster
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Alpakka
implicit val system = ActorSystem("ModelServing")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val modelProcessor = new ModelProcessor // Same as KS example
val scorer = new Scorer(modelProcessor) // Same as KS example
val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage”
val dataStream: Source[Record, Consumer.Control] =
Consumer.atMostOnceSource(dataConsumerSettings,
Subscriptions.topics(rawDataTopic))
.map(input => DataRecord.parseBytes(input.value()))
.collect{ case Success(data) => data }
val modelStream: Source[ModelImpl, Consumer.Control] =
Consumer.atMostOnceSource(modelConsumerSettings,
Subscriptions.topics(modelTopic))
.map(input => Model.parseBytes(input.value()))
.collect{ case Success(mod) => mod }
.map(model => ModelImpl.findModel(model))
Akka Cluster
Data
Model
Training
Model
Serving
Raw
Data
Model
Params
Alpakka
.collect{ case Success(data) => data }
val modelStream: Source[ModelImpl, Consumer.Control] =
Consumer.atMostOnceSource(modelConsumerSettings,
Subscriptions.topics(modelTopic))
.map(input => Model.parseBytes(input.value()))
.collect{ case Success(mod) => mod }
.map(model => ModelImpl.findModel(model))
.collect{ case Success(modImpl) => modImpl }
.foreach(modImpl => modelProcessor.setModel(modImpl))
modelStream.to(Sink.ignore).run() // No “sinking” required; just run
dataStream
.viaMat(modelScoringStage)(Keep.right)
.map(result => new ProducerRecord[Array[Byte], ScoredRecord](
scoredRecordsTopic, result))
.runWith(Producer.plainSink(producerSettings))
Wrapping Up…
Free as in🍺
New second edition!
lbnd.io/fast-data-book
lightbend.com/fast-data-platform
dean.wampler@lightbend.com
lightbend.com/fast-data-platform
polyglotprogramming.com/talks
Questions?
lightbend.com/fast-data-platform

More Related Content

What's hot (20)

PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
PDF
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
PDF
Reactive dashboard’s using apache spark
Rahul Kumar
 
PDF
Productizing Structured Streaming Jobs
Databricks
 
PDF
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
PDF
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Patrick Di Loreto
 
PDF
Building a fully-automated Fast Data Platform
Manuel Sehlinger
 
PDF
Lambda Architecture Using SQL
SATOSHI TAGOMORI
 
PDF
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
PDF
Stanford CS347 Guest Lecture: Apache Spark
Reynold Xin
 
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
PDF
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
PDF
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
PDF
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
Cassandra + Spark + Elk
Vasil Remeniuk
 
PDF
Fast NoSQL from HDDs?
ScyllaDB
 
PDF
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Reactive dashboard’s using apache spark
Rahul Kumar
 
Productizing Structured Streaming Jobs
Databricks
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Patrick Di Loreto
 
Building a fully-automated Fast Data Platform
Manuel Sehlinger
 
Lambda Architecture Using SQL
SATOSHI TAGOMORI
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Stanford CS347 Guest Lecture: Apache Spark
Reynold Xin
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
Cassandra + Spark + Elk
Vasil Remeniuk
 
Fast NoSQL from HDDs?
ScyllaDB
 
Top 5 mistakes when writing Streaming applications
hadooparchbook
 

Similar to Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STREAMS, AND KAFKA (20)

PDF
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Lightbend
 
PDF
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
PDF
Data Streaming in Kafka
SilviuMarcu1
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PDF
Event Driven Microservices
Fabrizio Fortino
 
PDF
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
PDF
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Lightbend
 
PDF
Build real time stream processing applications using Apache Kafka
Hotstar
 
PDF
Streaming architecture patterns
hadooparchbook
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PDF
Streaming, Database & Distributed Systems Bridging the Divide
Ben Stopford
 
PDF
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
PPTX
Ai big dataconference_ml_fastdata_vitalii bondarenko
Olga Zinkevych
 
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
DataConf
 
PPTX
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Tathagata Das
 
PDF
10 essentials steps for kafka streaming services
inovia
 
PPTX
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Lightbend
 
Streaming Microservices With Akka Streams And Kafka Streams
Lightbend
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Data Streaming in Kafka
SilviuMarcu1
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Event Driven Microservices
Fabrizio Fortino
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
Codemotion Dubai
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Lightbend
 
Build real time stream processing applications using Apache Kafka
Hotstar
 
Streaming architecture patterns
hadooparchbook
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Streaming, Database & Distributed Systems Bridging the Divide
Ben Stopford
 
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Ai big dataconference_ml_fastdata_vitalii bondarenko
Olga Zinkevych
 
Vitalii Bondarenko "Machine Learning on Fast Data"
DataConf
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Tathagata Das
 
10 essentials steps for kafka streaming services
inovia
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Ad

More from Matt Stubbs (20)

PDF
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
PDF
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
PDF
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
PDF
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
PDF
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
PDF
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
PDF
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
PDF
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
PDF
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
PDF
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
PDF
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
PDF
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
PDF
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Ad

Recently uploaded (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
What Is Data Integration and Transformation?
subhashenia
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
Research Methodology Overview Introduction
ayeshagul29594
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 

Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STREAMS, AND KAFKA

  • 1. Dean Wampler, Ph.D. [email protected] @deanwampler Streaming Microservices With Akka Streams and Kafka Streams
  • 7. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices ReacBve PlaEorm Go Node.js …
  • 8. Kubernetes, Mesos, YARN, … Cloud or on-premise Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices ReacBve PlaEorm Go Node.js … Today’s focus: •Kafka - the data backplane •Akka Streams and Kafka Streams - streaming microservices
  • 10. Files Sockets REST ZooKeeper Cluster ZK Ka5 Ak S3, … HDFS DiskDiskDisk 1 5 6 3 10 KaFa Cluster Broker 2 4 7 8 Events Streams Microservices ReacBve PlaEorm Go Node.js … Why Kafka?
  • 11. Organized into topics Ka#a Partition 1 Partition 2 Topic A Partition 1Topic B Topics are partitioned, replicated, and distributed
  • 12. Unlike queues, consumers don’t delete entries; Kafka manages their lifecycles M Producers N Consumers, who start reading where they want Consumer 1 (at offset 14) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Partition 1Topic B Producer 1 Producer 2 Consumer 2 (at offset 10) writes reads Consumer 3 (at offset 6) earliest latest Logs, not queues!
  • 13. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Kafka for Connectivity X
  • 14. Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N * M links ConsumersProducers Before: Kafka for Connectivity Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: X X
  • 15. Kafka for Connectivity Service 1 Log & Other Files Internet Services Service 2 Service 3 Services Services N + M links ConsumersProducers After: • Simplify dependencies • Resilient against data loss • M producers, N consumers • Simplicity of one “API” for communication
  • 17. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … Spark, Flink - services to which you submit work. Large scale, automatic data partitioning. Beam - similar. Google’s project that has been instrumental in defining streaming semantics.
  • 18. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … They do a lot (Spark example) …NodeNode Spark Driver object MyApp { def main() { val ss = new SparkSession(…) … } } Cluster Manager Spark Executor task task task task Spark Executor task task task task …
  • 19. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … They do a lot (Spark example) Cluster Node input Node Node Node Time filter flatMap join map Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 … stage1stage2 … … … …
  • 20. Files Sockets REST ZooKeeper Cluster ZK Mini-batch Spark Batch Spark … Low Latency Flink Ka5a Streams Akka Streams Beam Persistence S3, … HDFS DiskDiskDisk SQL/ NoSQL Search 1 5 6 KaFa Cluster Broker 2 4 7 8 9 Beam Spark Events Streams Storage Microservices Go Node.js … Akka Streams, Kafka Streams - libraries for “data-centric microservices”. Smaller scale, but great flexibility Streaming Engines
  • 21. “Record-centric” μ-services Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory storage Data Model Training Model Serving Other Logic ← Data Spectrum →
  • 22. A Spectrum of Microservices … Browse REST AccountOrders Shopping Cart API Gateway Inventory • Each datum has an identity • Process each one uniquely • Think sessions and state machines
  • 23. “Record-centric” μ-services Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory storage Data Model Training Model Serving Other Logic ← Data Spectrum →
  • 24. A Spectrum of Microservices storage Data Model Training Model Serving Other Logic • “Anonymous” records • Process en masse • Think SQL queries for analytics
  • 25. “Record-centric” μ-services Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory storage Data Model Training Model Serving Other Logic ← Data Spectrum →
  • 26. Events Records A Spectrum of Microservices Event-driven μ-services … Browse REST AccountOrders Shopping Cart API Gateway Inventory Akka emerged from the left-hand side of the spectrum, the world of highly Reactive microservices. Akka Streams pushes to the right, more data-centric. https://www.reactivemanifesto.org/ ← Data Spectrum →
  • 27. “Record-centric” μ-services Events Records A Spectrum of Microservices storage Data Model Training Model Serving Other Logic Emerged from the right-hand side. Kafka Streams pushes to the left, supporting many event- processing scenarios. ← Data Spectrum →
  • 29. • Important stream-processing semantics, e.g., • Windowing support (e.g., group by within a window) Kafka Streams 0 Time (minutes) 1 2 3 … Analysis Server 1 Server 2 accumulate 1 1 2 2 2 2 2 2 1 1 2 2 1 1 1 … Key Collect data, Then process accumulate n Event at Server n propagated to Analysis See my O’Reilly report for details.
  • 30. • Important stream-processing semantics, e.g., • Distinguish between event time and processing time Kafka Streams 0 Time (minutes) 1 2 3 … Analysis Server 1 Server 2 accumulate 1 1 2 2 2 2 2 2 1 1 2 2 1 1 1 … Key Collect data, Then process accumulate n Event at Server n propagated to Analysis
  • 31. • Java API • Scala API: written by Lightbend • SQL!! Kafka Streams
  • 33. Data Model Training Model Serving Raw Data Model Params Scored Records val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record    .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model))    .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes))    .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record))    .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close())
  • 34. val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record    .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model))    .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes))    .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record))    .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close()) Data Model Training Model Serving Raw Data Model Params Scored Records
  • 35. val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record    .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model))    .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes))    .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record))    .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close()) Data Model Training Model Serving Raw Data Model Params Scored Records
  • 36. val builder = new StreamsBuilderS // New Scala Wrapper API. val data = builder.stream[Array[Byte], Array[Byte]](rawDataTopic) val model = builder.stream[Array[Byte], Array[Byte]](modelTopic) val modelProcessor = new ModelProcessor val scorer = new Scorer(modelProcessor) // scorer.score(record) used model.mapValues(bytes => Model.parseBytes(bytes)) // array => record    .filter((key, model) => model.valid) // Successful? .mapValues(model => ModelImpl.findModel(model))    .process(() => modelProcessor, …) // Set up actual model data.mapValues(bytes => DataRecord.parseBytes(bytes))    .filter((key, record) => record.valid) .mapValues(record => new ScoredRecord(scorer.score(record),record))    .to(scoredRecordsTopic) val streams = new KafkaStreams( builder.build, streamsConfiguration) streams.start() sys.addShutdownHook(streams.close()) Data Model Training Model Serving Raw Data Model Params Scored Records
  • 37. What’s Missing? The rest of the microservice tools you need. Embed your Kafka Streams code in microservices written with Akka…
  • 39. Akka Streams Event Event Event Event Event Event Event/Data Stream Consumer Consumer bounded queue back pressure back pressure back pressure • Back pressure for flow control • http://www.reactive-streams.org/
  • 41. Akka Streams • Part of the Akka ecosystem • Akka Actors, Akka Cluster, Akka HTTP, Akka Persistence, … • Alpakka - rich connection library • Optimized for low overhead and latency
  • 42. Akka Streams • The “gist” - calculate factorials: val source: Source[Int, NotUsed] = Source(1 to 10) val factorials = source.scan(BigInt(1)) ( (total, next) => total * next ) factorials.runWith(Sink.foreach(println)) 1 2 6 24 120 720 5040 40320 362880 3628800
  • 43. val source: Source[Int, NotUsed] = Source(1 to 10) val factorials = source.scan(BigInt(1)) ( (total, next) => total * next ) factorials.runWith(Sink.foreach(println)) Akka Streams • The “gist” - calculate factorials: Source Flow Sink A “Graph” 1 2 6 24 120 720 5040 40320 362880 3628800
  • 44. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor // Same as KS example val scorer = new Scorer(modelProcessor) // Same as KS example val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage” val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model))
  • 45. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor // Same as KS example val scorer = new Scorer(modelProcessor) // Same as KS example val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage” val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model))
  • 46. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka implicit val system = ActorSystem("ModelServing") implicit val materializer = ActorMaterializer() implicit val executionContext = system.dispatcher val modelProcessor = new ModelProcessor // Same as KS example val scorer = new Scorer(modelProcessor) // Same as KS example val modelScoringStage = new ModelScoringStage(scorer)// AS custom “stage” val dataStream: Source[Record, Consumer.Control] = Consumer.atMostOnceSource(dataConsumerSettings, Subscriptions.topics(rawDataTopic)) .map(input => DataRecord.parseBytes(input.value())) .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model))
  • 47. Akka Cluster Data Model Training Model Serving Raw Data Model Params Alpakka .collect{ case Success(data) => data } val modelStream: Source[ModelImpl, Consumer.Control] = Consumer.atMostOnceSource(modelConsumerSettings, Subscriptions.topics(modelTopic)) .map(input => Model.parseBytes(input.value())) .collect{ case Success(mod) => mod } .map(model => ModelImpl.findModel(model)) .collect{ case Success(modImpl) => modImpl } .foreach(modImpl => modelProcessor.setModel(modImpl)) modelStream.to(Sink.ignore).run() // No “sinking” required; just run dataStream .viaMat(modelScoringStage)(Keep.right) .map(result => new ProducerRecord[Array[Byte], ScoredRecord]( scoredRecordsTopic, result)) .runWith(Producer.plainSink(producerSettings))
  • 49. Free as in🍺 New second edition! lbnd.io/fast-data-book