SlideShare a Scribd company logo
Akka in Production
Evan Chan
Scala Days 2015
March 17, 2015
Who is this guy?
•Principal Engineer, Socrata, Inc.
•http://github.com/velvia
•Author of multiple open source Akka/Scala
projects - Spark Job Server, ScalaStorm, etc.
•@evanfchan
A plug for a few projects…
•http://github.com/velvia/links - my stash of
interesting Scala & big data projects
•http://github.com/velvia/filo - a new, extreme
vector serialization library for fast analytics
•Talk to me later if you are interested in fast
serialization or columnar/analytics databases
Who is Socrata?
!
We are a Seattle-based software startup. 
!
We make data useful to everyone.
Open, Public Data
Consumers
Apps
Socrata is…
The most widely adopted Open Data platform
Scala at Socrata
•Started with old monolithic Java app
•Started writing new features in Scala - 2.8
•Today - 100% backend development in Scala,
2.10 / 2.11, many micro services
•custom SBT plugins, macros, more
•socrata-http
•rojoma-json
Want Reactive?
event-driven, scalable, resilient and responsive
Akka in Production - ScalaDays 2015
Agenda
• How does one get started with Akka?
• To be honest, Akka is what drew me into Scala
• Examples of Akka use cases
• Compared with other technologies
• Tips on using Akka in production
• Including back pressure, monitoring, VisualVM usage,
etc.
Ingestion Architectures
with Akka
Akka Stack
• Spray - high performance HTTP

• SLF4J / Logback

• Yammer Metrics

• spray-json

• Akka 2.x

• Scala 2.10
Ingesting 2 Billion Events / Day
Nginx
Raw Log
Feeder
Kafka
Storm
New Stuff
Consumer watches
video
Livelogsd - Akka/Kafka file tailer
Current
File
Rotated
File
Rotated
File 2
File
Reader
Actor
File
Reader
Actor
Kafka Feeder
Coordinator
Kafka
Storm - with or without Akka?
Kafka
Spout
Bolt
Actor
Actor
• Actors talking to each other within a
bolt for locality

• Don’t really need Actors in Storm

• In production, found Storm too
complex to troubleshoot

• It’s 2am - what should I restart?
Supervisor? Nimbus? ZK?
Akka Cluster-based Pipeline
Kafka
Consumer
Spray
endpoint
Cluster
Router
Processing
Actors
Kafka
Consumer
Spray
endpoint
Cluster
Router
Processing
Actors
Kafka
Consumer
Spray
endpoint
Cluster
Router
Processing
Actors
Kafka
Consumer
Spray
endpoint
Cluster
Router
Processing
Actors
Kafka
Consumer
Spray
endpoint
Cluster
Router
Processing
Actors
Lessons Learned
• Still too complex -- would we want to get paged for this
system?

• Akka cluster in 2.1 was not ready for production (newer
2.2.x version is stable)

• Mixture of actors and futures for HTTP requests
became hard to grok

• Actors were much easier for most developers to
understand
Simplified Ingestion Pipeline
Kafka
Partition
1
Kafka
SimpleConsumer
Converter Actor
Cassandra Writer
Actor
Kafka
Partition
2
Kafka
SimpleConsumer
Converter Actor
Cassandra Writer
Actor
• Kafka used to partition
messages

• Single process - super
simple!

• No distribution of data

• Linear actor pipeline -
very easy to understand
Stackable Actor Traits
Why Stackable Traits?
• Keep adding monitoring, logging, metrics, tracing code
gets pretty ugly and repetitive

• We want some standard behavior around actors -- but
we need to wrap the actor Receive block:

class someActor extends Actor {!
def wrappedReceive: Receive = {!
case x => blah!
}!
def receive = {!
case x =>!
println(“Do something before...”)!
wrappedReceive(x)!
println(“Do something after...”)!
}!
}
Start with a base trait...
trait ActorStack extends Actor {!
/** Actor classes should implement this partialFunction for standard!
* actor message handling!
*/!
def wrappedReceive: Receive!
!
/** Stackable traits should override and call super.receive(x) for!
* stacking functionality!
*/!
def receive: Receive = {!
case x => if (wrappedReceive.isDefinedAt(x)) wrappedReceive(x) else unhandled(x)!
// or: (wrappedReceive orElse unhandled)(x)!
}!
}!
Instrumenting Traits...
trait Instrument1 extends ActorStack {!
override def receive: Receive = {!
case x =>!
println("Do something before...")!
super.receive(x)!
println("Do something after...")!
}!
}
trait Instrument2 extends ActorStack {!
override def receive: Receive = {!
case x =>!
println("Antes...")!
super.receive(x)!
println("Despues...")!
}!
}
Now just mix the Traits in....
class DummyActor extends Actor with Instrument1 with Instrument2 {!
def wrappedReceive = {!
case "something" => println("Got something")!
case x => println("Got something else: " + x)!
}!
}
• Traits add instrumentation; Actors stay clean!

• Order of mixing in traits matter

Antes...!
Do something before...!
Got something!
Do something after...!
Despues...
Productionizing Akka
On distributed systems:
“The only thing that
matters is visibility”
Akka Performance Metrics
• We define a trait that adds two metrics for every actor:

• frequency of messages handled (1min, 5min, 15min
moving averages)

• time spent in receive block

• All metrics exposed via a Spray route /metricz

• Daemon polls /metricz and sends to metrics service

• Would like: mailbox size, but this is hard
Akka Performance Metrics
trait ActorMetrics extends ActorStack {!
// Timer includes a histogram of wrappedReceive() duration as well as moving avg of rate
of invocation!
val metricReceiveTimer = Metrics.newTimer(getClass, "message-handler",!
TimeUnit.MILLISECONDS, TimeUnit.SECONDS)!
!
override def receive: Receive = {!
case x =>!
val context = metricReceiveTimer.time()!
try {!
super.receive(x)!
} finally {!
context.stop()!
}!
}!
}
Performance Metrics (cont’d)
Performance Metrics (cont’d)
VisualVM and Akka
• Bounded mailboxes = time spent enqueueing msgs
VisualVM and Akka
• My dream: a VisualVM plugin to visualize Actor
utilization across threads
Tracing Akka Message Flows
• Stack trace is very useful for traditional apps, but for
Akka apps, you get this:
at akka.dispatch.Future$$anon$3.liftedTree1$1(Future.scala:195) ~[akka-actor-2.0.5.jar:2.0.5]!
at akka.dispatch.Future$$anon$3.run(Future.scala:194) ~[akka-actor-2.0.5.jar:2.0.5]!
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:94) [akka-actor-2.0.5.jar:2.0.5]!
at akka.jsr166y.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1381) [akka-actor-2.0.5.jar:2.0.5]!
at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) [akka-actor-2.0.5.jar:2.0.5]!
at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) [akka-actor-2.0.5.jar:2.0.5]!
at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) [akka-actor-2.0.5.jar:2.0.5]!
at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) [akka-actor-2.0.5.jar:2.0.5]
--> trAKKAr message trace <--!
akka://Ingest/user/Super --> akka://Ingest/user/K1: Initialize!
akka://Ingest/user/K1 --> akka://Ingest/user/Converter: Data
• What if you could get an Akka message trace?
Tracing Akka Message Flows
Tracing Akka Message Flows
• Trait sends an Edge(source, dest, messageInfo) to a
local Collector actor

• Aggregate edges across nodes, graph and profit!
trait TrakkarExtractor extends TrakkarBase with ActorStack {!
import TrakkarUtils._!
!
val messageIdExtractor: MessageIdExtractor = randomExtractor!
!
override def receive: Receive = {!
case x =>!
lastMsgId = (messageIdExtractor orElse randomExtractor)(x)!
Collector.sendEdge(sender, self, lastMsgId, x)!
super.receive(x)!
}!
}!
Akka Service Discovery
• Akka remote - need to know remote nodes

• Akka cluster - need to know seed nodes

• Use Zookeeper or /etcd

• http://blog.eigengo.com/2014/12/13/akka-cluster-
inventory/ - Akka cluster inventory extension

• Be careful - Akka is very picky about IP addresses.
Beware of AWS, Docker, etc. etc. Test, test, test.
Akka Instrumentation Libraries
• http://kamon.io

• Uses AspectJ to “weave” in instrumentation.
Metrics, logging, tracing.

• Instruments Akka, Spray, Play

• Provides statsD / graphite and other backends

• https://github.com/levkhomich/akka-tracing

• Zipkin distributed tracing for Akka
Backpressure and
Reliability
Intro to Backpressure
• Backpressure - ability to tell senders to slow down/stop

• Must look at entire system.

• Individual components (eg TCP) having flow control
does not mean entire system behaves well
Why not bounded mailboxes?
• By default, actor mailboxes are unbounded

• Using bounded mailboxes

• When mailbox is full, messages go to DeadLetters
• mailbox-push-timeout-time: how long to wait
when mailbox is full

• Doesn’t work for distributed Akka systems!

• Real flow control: pull, push with acks, etc.

• Works anywhere, but more work
Backpressure in Action
• A working back pressure system causes the rate of all
actor components to be in sync.

• Witness this message flow rate graph of the start of
event processing:
Akka Streams
• Very conservative (“pull based”)

• Consumer must first give permission to Publisher to
send data

• How does it work for fan-in scenarios?
Backpressure for fan-in
• Multiple input streams go to a single resource (DB?)

• May come and go

• Pressure comes from each stream and from # streams
Stream 1
Stream 2
Stream 3
Stream 4
Writer
Actor
DB
Backpressure for fan-in
• Same simple model, can control number of clients

• High overhead: lots of streams to notify “Ready”
Stream 1
Stream 2
Writer
Actor
Register
Ready for data
Data
At Least Once Delivery
What if you can’t drop messages on the floor?
At Least Once Delivery
• Let every message have a unique ID.

• Ack returns with unique ID to confirm message send.

• What happens if you don’t get an ack?
Actor A
Actor B
Msg 100 Msg 101 Msg 102
Ack 100 Ack 101?
At Least Once Delivery
• Resend unacked messages until confirmed == “at least
once”
Actor A
Actor B
Msg 100 Msg 101 Msg 102
Ack 100 Ack 101?
Resend 101
Ack timeout
At Least Once Delivery & Akka
• Resending messages requires keeping message history
around

• Unless your source of messages is Kafka - then just
replay from the last successful offset + 1

• Use Akka Persistence - has at-least-once semantics +
persistence of messages for better durability

• Exactly Once = at least once + deduplication

• Akka Persistence has this too!
Backpressure and at-least-once
• How about a system that works for fan-in, and handles back
pressure and at-least-once too?

• Let the client have an upper limit of unacked messages

• Server can reject new messages
Stream 1
Stream 2
Writer
Actor
Msg 100
Ack 100
Msg 101
Msg 200
Reject!
Backpressure and Futures
• Use an actor to limit # of outstanding futures
class CommandThrottlingActor(mapper: CommandThrottlingActor.Mapper,
maxFutures: Int) extends BaseActor {
import CommandThrottlingActor._
import context.dispatcher // for future callbacks
!
val mapperWithDefault = mapper orElse ({
case x: Any => Future { NoSuchCommand }
}: Mapper)
var outstandingFutures = 0
!
def receive: Receive = {
case FutureCompleted => if (outstandingFutures > 0) outstandingFutures -= 1
case c: Command =>
if (outstandingFutures >= maxFutures) {
sender ! TooManyOutstandingFutures
} else {
outstandingFutures += 1
val originator = sender // sender is a function, don't call in the callback
mapperWithDefault(c).onSuccess { case response: Response =>
self ! FutureCompleted
originator ! response
}
}
}
}
Good Akka development practices
• Don't put things that can fail into Actor constructor

• Default supervision strategy stops an Actor which
cannot initialize itself

• Instead use an Initialize message

• Put your messages in the Actor’s companion object

• Namespacing is nice
Couple more random hints
• Learn Akka Testkit.

• Master it! The most useful tool for testing Akka
actors.

• Many examples in spark-jobserver repo

• gracefulStop()

• TestKit.shutdownActorSystem(system)
Thank you!!
• Queues don’t fix overload

• Stackable actor traits - see ActorStack in spark-
jobserver repo
Extra slides
Putting it all together
Akka Visibility, Minimal Footprint
trait InstrumentedActor extends Slf4jLogging with ActorMetrics with TrakkarExtractor!
!
object MyWorkerActor {!
case object Initialize!
case class DoSomeWork(desc: String)!
}!
!
class MyWorkerActor extends InstrumentedActor {!
def wrappedReceive = {!
case Initialize =>!
case DoSomeWork(desc) =>!
}!
}
Using Logback with Akka
• Pretty easy setup

• Include the Logback jar

• In your application.conf:

event-handlers = ["akka.event.slf4j.Slf4jEventHandler"]

• Use a custom logging trait, not ActorLogging

• ActorLogging does not allow adjustable logging levels

• Want the Actor path in your messages?

• org.slf4j.MDC.put(“actorPath”, self.path.toString)
Using Logback with Akka
trait Slf4jLogging extends Actor with ActorStack {!
val logger = LoggerFactory.getLogger(getClass)!
private[this] val myPath = self.path.toString!
!
logger.info("Starting actor " + getClass.getName)!
!
override def receive: Receive = {!
case x =>!
org.slf4j.MDC.put("akkaSource", myPath)!
super.receive(x)!
}!
}

More Related Content

PDF
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
PDF
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
PDF
Reactive app using actor model & apache spark
Rahul Kumar
 
PDF
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
PDF
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PDF
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
Reactive app using actor model & apache spark
Rahul Kumar
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 

What's hot (20)

PDF
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
PDF
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
PDF
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Simon Ambridge
 
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
PDF
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Patrick Di Loreto
 
PDF
Reactive dashboard’s using apache spark
Rahul Kumar
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
PPTX
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Robert "Chip" Senkbeil
 
PPTX
Developing a Real-time Engine with Akka, Cassandra, and Spray
Jacob Park
 
PPTX
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
 
PDF
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Streaming Big Data & Analytics For Scale
Helena Edelson
 
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
PDF
Rethinking Streaming Analytics For Scale
Helena Edelson
 
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Simon Ambridge
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Patrick Di Loreto
 
Reactive dashboard’s using apache spark
Rahul Kumar
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Robert "Chip" Senkbeil
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Jacob Park
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
 
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Streaming Big Data & Analytics For Scale
Helena Edelson
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Rethinking Streaming Analytics For Scale
Helena Edelson
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Ad

Viewers also liked (13)

PPTX
Intro to Apache Spark
Mammoth Data
 
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PDF
Zen of Akka
Konrad Malawski
 
PDF
Advanced akka features
Grzegorz Duda
 
PDF
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Jonas Bonér
 
PDF
Introducing Akka
Jonas Bonér
 
PDF
Reactive Stream Processing with Akka Streams
Konrad Malawski
 
PDF
H2O - the optimized HTTP server
Kazuho Oku
 
PDF
Container Orchestration Wars
Karl Isenberg
 
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
PDF
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 
Intro to Apache Spark
Mammoth Data
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
Zen of Akka
Konrad Malawski
 
Advanced akka features
Grzegorz Duda
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Jonas Bonér
 
Introducing Akka
Jonas Bonér
 
Reactive Stream Processing with Akka Streams
Konrad Malawski
 
H2O - the optimized HTTP server
Kazuho Oku
 
Container Orchestration Wars
Karl Isenberg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 
Ad

Similar to Akka in Production - ScalaDays 2015 (20)

PPTX
Reactive Streams - László van den Hoek
RubiX BV
 
PPTX
Real world Scala hAkking NLJUG JFall 2011
Raymond Roestenburg
 
PDF
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
PPTX
Developing distributed applications with Akka and Akka Cluster
Konstantin Tsykulenko
 
PDF
Keystone - ApacheCon 2016
Peter Bakas
 
PPTX
From a kafkaesque story to The Promised Land
Ran Silberman
 
PPTX
Akka-demy (a.k.a. How to build stateful distributed systems) I/II
Peter Csala
 
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
PPTX
Apache kafka
Kumar Shivam
 
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
PDF
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
SolarWinds Loggly
 
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent
 
PDF
Making Apache Kafka Even Faster And More Scalable
PaulBrebner2
 
PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
PDF
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini
 
PDF
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann
 
PDF
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
PPTX
Cloud Security Monitoring and Spark Analytics
amesar0
 
PDF
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
PDF
Kafka practical experience
Rico Chen
 
Reactive Streams - László van den Hoek
RubiX BV
 
Real world Scala hAkking NLJUG JFall 2011
Raymond Roestenburg
 
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Developing distributed applications with Akka and Akka Cluster
Konstantin Tsykulenko
 
Keystone - ApacheCon 2016
Peter Bakas
 
From a kafkaesque story to The Promised Land
Ran Silberman
 
Akka-demy (a.k.a. How to build stateful distributed systems) I/II
Peter Csala
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
Apache kafka
Kumar Shivam
 
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
SolarWinds Loggly
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent
 
Making Apache Kafka Even Faster And More Scalable
PaulBrebner2
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini
 
Writing Asynchronous Programs with Scala & Akka
Yardena Meymann
 
Lessons Learned: Using Spark and Microservices
Alexis Seigneurin
 
Cloud Security Monitoring and Spark Analytics
amesar0
 
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
Kafka practical experience
Rico Chen
 

More from Evan Chan (17)

PDF
Time-State Analytics: MinneAnalytics 2024 Talk
Evan Chan
 
PDF
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
PDF
Designing Stateful Apps for Cloud and Kubernetes
Evan Chan
 
PDF
Histograms at scale - Monitorama 2019
Evan Chan
 
PDF
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Evan Chan
 
PDF
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
PDF
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
PDF
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
PDF
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
 
PDF
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan
 
PDF
Productionizing Spark and the Spark Job Server
Evan Chan
 
PDF
MIT lecture - Socrata Open Data Architecture
Evan Chan
 
PDF
OLAP with Cassandra and Spark
Evan Chan
 
PDF
Spark Summit 2014: Spark Job Server Talk
Evan Chan
 
PDF
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Evan Chan
 
PDF
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
PDF
Real-time Analytics with Cassandra, Spark, and Shark
Evan Chan
 
Time-State Analytics: MinneAnalytics 2024 Talk
Evan Chan
 
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
Designing Stateful Apps for Cloud and Kubernetes
Evan Chan
 
Histograms at scale - Monitorama 2019
Evan Chan
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Evan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan
 
Productionizing Spark and the Spark Job Server
Evan Chan
 
MIT lecture - Socrata Open Data Architecture
Evan Chan
 
OLAP with Cassandra and Spark
Evan Chan
 
Spark Summit 2014: Spark Job Server Talk
Evan Chan
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Evan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Evan Chan
 

Recently uploaded (20)

PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
Information Retrieval and Extraction - Module 7
premSankar19
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 

Akka in Production - ScalaDays 2015

  • 1. Akka in Production Evan Chan Scala Days 2015 March 17, 2015
  • 2. Who is this guy? •Principal Engineer, Socrata, Inc. •http://github.com/velvia •Author of multiple open source Akka/Scala projects - Spark Job Server, ScalaStorm, etc. •@evanfchan
  • 3. A plug for a few projects… •http://github.com/velvia/links - my stash of interesting Scala & big data projects •http://github.com/velvia/filo - a new, extreme vector serialization library for fast analytics •Talk to me later if you are interested in fast serialization or columnar/analytics databases
  • 4. Who is Socrata? ! We are a Seattle-based software startup. ! We make data useful to everyone. Open, Public Data Consumers Apps
  • 5. Socrata is… The most widely adopted Open Data platform
  • 6. Scala at Socrata •Started with old monolithic Java app •Started writing new features in Scala - 2.8 •Today - 100% backend development in Scala, 2.10 / 2.11, many micro services •custom SBT plugins, macros, more •socrata-http •rojoma-json
  • 7. Want Reactive? event-driven, scalable, resilient and responsive
  • 9. Agenda • How does one get started with Akka? • To be honest, Akka is what drew me into Scala • Examples of Akka use cases • Compared with other technologies • Tips on using Akka in production • Including back pressure, monitoring, VisualVM usage, etc.
  • 11. Akka Stack • Spray - high performance HTTP • SLF4J / Logback • Yammer Metrics • spray-json • Akka 2.x • Scala 2.10
  • 12. Ingesting 2 Billion Events / Day Nginx Raw Log Feeder Kafka Storm New Stuff Consumer watches video
  • 13. Livelogsd - Akka/Kafka file tailer Current File Rotated File Rotated File 2 File Reader Actor File Reader Actor Kafka Feeder Coordinator Kafka
  • 14. Storm - with or without Akka? Kafka Spout Bolt Actor Actor • Actors talking to each other within a bolt for locality • Don’t really need Actors in Storm • In production, found Storm too complex to troubleshoot • It’s 2am - what should I restart? Supervisor? Nimbus? ZK?
  • 16. Lessons Learned • Still too complex -- would we want to get paged for this system? • Akka cluster in 2.1 was not ready for production (newer 2.2.x version is stable) • Mixture of actors and futures for HTTP requests became hard to grok • Actors were much easier for most developers to understand
  • 17. Simplified Ingestion Pipeline Kafka Partition 1 Kafka SimpleConsumer Converter Actor Cassandra Writer Actor Kafka Partition 2 Kafka SimpleConsumer Converter Actor Cassandra Writer Actor • Kafka used to partition messages • Single process - super simple! • No distribution of data • Linear actor pipeline - very easy to understand
  • 19. Why Stackable Traits? • Keep adding monitoring, logging, metrics, tracing code gets pretty ugly and repetitive • We want some standard behavior around actors -- but we need to wrap the actor Receive block: class someActor extends Actor {! def wrappedReceive: Receive = {! case x => blah! }! def receive = {! case x =>! println(“Do something before...”)! wrappedReceive(x)! println(“Do something after...”)! }! }
  • 20. Start with a base trait... trait ActorStack extends Actor {! /** Actor classes should implement this partialFunction for standard! * actor message handling! */! def wrappedReceive: Receive! ! /** Stackable traits should override and call super.receive(x) for! * stacking functionality! */! def receive: Receive = {! case x => if (wrappedReceive.isDefinedAt(x)) wrappedReceive(x) else unhandled(x)! // or: (wrappedReceive orElse unhandled)(x)! }! }!
  • 21. Instrumenting Traits... trait Instrument1 extends ActorStack {! override def receive: Receive = {! case x =>! println("Do something before...")! super.receive(x)! println("Do something after...")! }! } trait Instrument2 extends ActorStack {! override def receive: Receive = {! case x =>! println("Antes...")! super.receive(x)! println("Despues...")! }! }
  • 22. Now just mix the Traits in.... class DummyActor extends Actor with Instrument1 with Instrument2 {! def wrappedReceive = {! case "something" => println("Got something")! case x => println("Got something else: " + x)! }! } • Traits add instrumentation; Actors stay clean! • Order of mixing in traits matter Antes...! Do something before...! Got something! Do something after...! Despues...
  • 24. On distributed systems: “The only thing that matters is visibility”
  • 25. Akka Performance Metrics • We define a trait that adds two metrics for every actor: • frequency of messages handled (1min, 5min, 15min moving averages) • time spent in receive block • All metrics exposed via a Spray route /metricz • Daemon polls /metricz and sends to metrics service • Would like: mailbox size, but this is hard
  • 26. Akka Performance Metrics trait ActorMetrics extends ActorStack {! // Timer includes a histogram of wrappedReceive() duration as well as moving avg of rate of invocation! val metricReceiveTimer = Metrics.newTimer(getClass, "message-handler",! TimeUnit.MILLISECONDS, TimeUnit.SECONDS)! ! override def receive: Receive = {! case x =>! val context = metricReceiveTimer.time()! try {! super.receive(x)! } finally {! context.stop()! }! }! }
  • 29. VisualVM and Akka • Bounded mailboxes = time spent enqueueing msgs
  • 30. VisualVM and Akka • My dream: a VisualVM plugin to visualize Actor utilization across threads
  • 31. Tracing Akka Message Flows • Stack trace is very useful for traditional apps, but for Akka apps, you get this: at akka.dispatch.Future$$anon$3.liftedTree1$1(Future.scala:195) ~[akka-actor-2.0.5.jar:2.0.5]! at akka.dispatch.Future$$anon$3.run(Future.scala:194) ~[akka-actor-2.0.5.jar:2.0.5]! at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:94) [akka-actor-2.0.5.jar:2.0.5]! at akka.jsr166y.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1381) [akka-actor-2.0.5.jar:2.0.5]! at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) [akka-actor-2.0.5.jar:2.0.5]! at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) [akka-actor-2.0.5.jar:2.0.5]! at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) [akka-actor-2.0.5.jar:2.0.5]! at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) [akka-actor-2.0.5.jar:2.0.5] --> trAKKAr message trace <--! akka://Ingest/user/Super --> akka://Ingest/user/K1: Initialize! akka://Ingest/user/K1 --> akka://Ingest/user/Converter: Data • What if you could get an Akka message trace?
  • 33. Tracing Akka Message Flows • Trait sends an Edge(source, dest, messageInfo) to a local Collector actor • Aggregate edges across nodes, graph and profit! trait TrakkarExtractor extends TrakkarBase with ActorStack {! import TrakkarUtils._! ! val messageIdExtractor: MessageIdExtractor = randomExtractor! ! override def receive: Receive = {! case x =>! lastMsgId = (messageIdExtractor orElse randomExtractor)(x)! Collector.sendEdge(sender, self, lastMsgId, x)! super.receive(x)! }! }!
  • 34. Akka Service Discovery • Akka remote - need to know remote nodes • Akka cluster - need to know seed nodes • Use Zookeeper or /etcd • http://blog.eigengo.com/2014/12/13/akka-cluster- inventory/ - Akka cluster inventory extension • Be careful - Akka is very picky about IP addresses. Beware of AWS, Docker, etc. etc. Test, test, test.
  • 35. Akka Instrumentation Libraries • http://kamon.io • Uses AspectJ to “weave” in instrumentation. Metrics, logging, tracing. • Instruments Akka, Spray, Play • Provides statsD / graphite and other backends • https://github.com/levkhomich/akka-tracing • Zipkin distributed tracing for Akka
  • 37. Intro to Backpressure • Backpressure - ability to tell senders to slow down/stop • Must look at entire system. • Individual components (eg TCP) having flow control does not mean entire system behaves well
  • 38. Why not bounded mailboxes? • By default, actor mailboxes are unbounded • Using bounded mailboxes • When mailbox is full, messages go to DeadLetters • mailbox-push-timeout-time: how long to wait when mailbox is full • Doesn’t work for distributed Akka systems! • Real flow control: pull, push with acks, etc. • Works anywhere, but more work
  • 39. Backpressure in Action • A working back pressure system causes the rate of all actor components to be in sync. • Witness this message flow rate graph of the start of event processing:
  • 40. Akka Streams • Very conservative (“pull based”) • Consumer must first give permission to Publisher to send data • How does it work for fan-in scenarios?
  • 41. Backpressure for fan-in • Multiple input streams go to a single resource (DB?) • May come and go • Pressure comes from each stream and from # streams Stream 1 Stream 2 Stream 3 Stream 4 Writer Actor DB
  • 42. Backpressure for fan-in • Same simple model, can control number of clients • High overhead: lots of streams to notify “Ready” Stream 1 Stream 2 Writer Actor Register Ready for data Data
  • 43. At Least Once Delivery What if you can’t drop messages on the floor?
  • 44. At Least Once Delivery • Let every message have a unique ID. • Ack returns with unique ID to confirm message send. • What happens if you don’t get an ack? Actor A Actor B Msg 100 Msg 101 Msg 102 Ack 100 Ack 101?
  • 45. At Least Once Delivery • Resend unacked messages until confirmed == “at least once” Actor A Actor B Msg 100 Msg 101 Msg 102 Ack 100 Ack 101? Resend 101 Ack timeout
  • 46. At Least Once Delivery & Akka • Resending messages requires keeping message history around • Unless your source of messages is Kafka - then just replay from the last successful offset + 1 • Use Akka Persistence - has at-least-once semantics + persistence of messages for better durability • Exactly Once = at least once + deduplication • Akka Persistence has this too!
  • 47. Backpressure and at-least-once • How about a system that works for fan-in, and handles back pressure and at-least-once too? • Let the client have an upper limit of unacked messages • Server can reject new messages Stream 1 Stream 2 Writer Actor Msg 100 Ack 100 Msg 101 Msg 200 Reject!
  • 48. Backpressure and Futures • Use an actor to limit # of outstanding futures class CommandThrottlingActor(mapper: CommandThrottlingActor.Mapper, maxFutures: Int) extends BaseActor { import CommandThrottlingActor._ import context.dispatcher // for future callbacks ! val mapperWithDefault = mapper orElse ({ case x: Any => Future { NoSuchCommand } }: Mapper) var outstandingFutures = 0 ! def receive: Receive = { case FutureCompleted => if (outstandingFutures > 0) outstandingFutures -= 1 case c: Command => if (outstandingFutures >= maxFutures) { sender ! TooManyOutstandingFutures } else { outstandingFutures += 1 val originator = sender // sender is a function, don't call in the callback mapperWithDefault(c).onSuccess { case response: Response => self ! FutureCompleted originator ! response } } } }
  • 49. Good Akka development practices • Don't put things that can fail into Actor constructor • Default supervision strategy stops an Actor which cannot initialize itself • Instead use an Initialize message • Put your messages in the Actor’s companion object • Namespacing is nice
  • 50. Couple more random hints • Learn Akka Testkit. • Master it! The most useful tool for testing Akka actors. • Many examples in spark-jobserver repo • gracefulStop() • TestKit.shutdownActorSystem(system)
  • 51. Thank you!! • Queues don’t fix overload • Stackable actor traits - see ActorStack in spark- jobserver repo
  • 53. Putting it all together
  • 54. Akka Visibility, Minimal Footprint trait InstrumentedActor extends Slf4jLogging with ActorMetrics with TrakkarExtractor! ! object MyWorkerActor {! case object Initialize! case class DoSomeWork(desc: String)! }! ! class MyWorkerActor extends InstrumentedActor {! def wrappedReceive = {! case Initialize =>! case DoSomeWork(desc) =>! }! }
  • 55. Using Logback with Akka • Pretty easy setup • Include the Logback jar • In your application.conf:
 event-handlers = ["akka.event.slf4j.Slf4jEventHandler"] • Use a custom logging trait, not ActorLogging • ActorLogging does not allow adjustable logging levels • Want the Actor path in your messages? • org.slf4j.MDC.put(“actorPath”, self.path.toString)
  • 56. Using Logback with Akka trait Slf4jLogging extends Actor with ActorStack {! val logger = LoggerFactory.getLogger(getClass)! private[this] val myPath = self.path.toString! ! logger.info("Starting actor " + getClass.getName)! ! override def receive: Receive = {! case x =>! org.slf4j.MDC.put("akkaSource", myPath)! super.receive(x)! }! }