Adopting actors
An epic tail of loss and learning
Iain Hull
iain.hull@workday.com
@IainHull
http://workday.github.io
Workday
Growth
2013 2014 2015 2016
Cloud Master
Launch tasks Assign to agents
Cloud Master
Launch tasks Assign to agents
Adopting Actors: An epic tail of loss and learning
Service Growth
in millions of tasks per month
0
5
10
15
20
Print
Large
Small
Batch
Why Akka?
Initial Observations
Parent
Config Child
Snapshots
Changes
Parent
Config Child
Snapshots
Changes
Message flow:
Ensure messages follow a consistent path
Parent
Config Child
Snapshots
Changes
Creation:
Assume actor is recovering from failure
(state machine)
Anti-patterns
God
Class
Movie Star
Pool
Agent
State
Agent Agent Agent Agent
Queu
e
Movie Star
Too much state
• Hard to reason about
• Too many messages in flight
• Hard to recover
• Bad concurrency
Split Brain
Pool
Agent
State
Agent Agent Agent Agent
Duplicate state
Single source of truth
• Synchronizing state is hard
• Failure causes
–State out of sync
–Causes more failure
Split Brain
Pool
Agent
State
Agent Agent Agent Agent
Task
Passing responsibility
Seems simple at first
• Do not always know who is in control
• Both actors updating the same row
• Creates race conditions
Can you
let it crash?
Pool
Agent
State
Agent Agent Agent Agent
Can you let it crash?
Lessons
Test for resilience
• Chaos Marmoset
• Unit test recovery
• Destructive system test
Stateless
Enterprise
idioms
do not apply
Sovereignty
One actor
• One row
• One shard
• One table
Otherwise failure hard to handle
Atomicity
Actors
Atomic receive method
State not shared
Comms async messages
Not nestable
Mutex
Atomic scope
State is shared
Comms via mutable state
Nestable (ACID)
Atomicity
Anything!!! Nothing
Actors Mutex
Pool
Agent
State
Agent Agent Agent Agent
Atomicity
Eventual
consistency
Lessons
- Atomicity and Consistency
- Actor modeling ≠ Object modeling
- Test for Resilience not robustness
- Refactor Early
Adopting Actors: An epic tail of loss and learning

More Related Content

PDF
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
PPTX
10 Things I Hate about DevOps
PDF
Continuous Deployment at Etsy: A Tale of Two Approaches
PPTX
Getting CI right for SQL Server
ODP
9 Productive Tips to Work Faster
PPTX
Full Stack Developers are no Longer Required
PPTX
Introduction to actor model with examples on Akka.NET
PPTX
Creating scalable message driven solutions akkadotnet
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10 Things I Hate about DevOps
Continuous Deployment at Etsy: A Tale of Two Approaches
Getting CI right for SQL Server
9 Productive Tips to Work Faster
Full Stack Developers are no Longer Required
Introduction to actor model with examples on Akka.NET
Creating scalable message driven solutions akkadotnet

Similar to Adopting Actors: An epic tail of loss and learning (20)

PPTX
DotNext 2020 - When and How to Use the Actor Model and Akka.NET
PDF
A gentle introduction into AKKA and the actor model
PDF
Akka - A Brief Intro
PPTX
Akka Actors
PDF
Akka (1)
PDF
Effective Akka v2
PDF
Akka-intro-training-public.pdf
PPTX
Akka.Net Ottawa .NET User Group Meetup
PDF
Introduction to Akka
PDF
Actors evolved- Rotem Hermon
PDF
Introduction to akka actors with java 8
PPTX
Akka - young fighter course
PDF
Effective Akka v2.0 - Jamie Allen
PDF
Daniela Sfregola - Intro to Akka
PDF
Building Massively Scalable application with Akka 2.0
PPTX
Nairobi JVM meetup : Introduction to akka
 
PDF
Supervise your Akka actors
PPTX
PDF
Sharing-akka-pub
PDF
Actor Model Akka Framework
DotNext 2020 - When and How to Use the Actor Model and Akka.NET
A gentle introduction into AKKA and the actor model
Akka - A Brief Intro
Akka Actors
Akka (1)
Effective Akka v2
Akka-intro-training-public.pdf
Akka.Net Ottawa .NET User Group Meetup
Introduction to Akka
Actors evolved- Rotem Hermon
Introduction to akka actors with java 8
Akka - young fighter course
Effective Akka v2.0 - Jamie Allen
Daniela Sfregola - Intro to Akka
Building Massively Scalable application with Akka 2.0
Nairobi JVM meetup : Introduction to akka
 
Supervise your Akka actors
Sharing-akka-pub
Actor Model Akka Framework
Ad

Recently uploaded (20)

PPTX
Lecture 5 Software Requirement Engineering
PPTX
Lesson-3-Operation-System-Support.pptx-I
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PDF
AI-Powered Fuzz Testing: The Future of QA
PDF
Engineering Document Management System (EDMS)
PPTX
Odoo ERP for Injection Molding Industry – Optimize Production & Reduce Scrap
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PDF
MAGIX Sound Forge Pro CrackSerial Key Keygen
PPTX
Viber For Windows 25.7.1 Crack + Serial Keygen
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PDF
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
PPTX
ROI Analysis for Newspaper Industry with Odoo ERP
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PDF
Lumion Pro Crack New latest version Download 2025
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
PPTX
Human-Computer Interaction for Lecture 1
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Lecture 5 Software Requirement Engineering
Lesson-3-Operation-System-Support.pptx-I
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
AI-Powered Fuzz Testing: The Future of QA
Engineering Document Management System (EDMS)
Odoo ERP for Injection Molding Industry – Optimize Production & Reduce Scrap
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
MAGIX Sound Forge Pro CrackSerial Key Keygen
Viber For Windows 25.7.1 Crack + Serial Keygen
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Why 2025 Is the Best Year to Hire Software Developers in India
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
ROI Analysis for Newspaper Industry with Odoo ERP
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Human Computer Interaction lecture Chapter 2.pptx
Top 10 Project Management Software for Small Teams in 2025.pdf
Lumion Pro Crack New latest version Download 2025
HackYourBrain__UtrechtJUG__11092025.pptx
Human-Computer Interaction for Lecture 1
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Ad

Adopting Actors: An epic tail of loss and learning

Editor's Notes

  • #3: Principle Engineer – Workday’s Grid Cloud Master team. – Who is workday
  • #4: Finance and Human Capital Management – ERP Vendor – 100% in the cloud – all customers on a single version
  • #5: Fiscal 2016 Total Revenue of $1.16 billion, up 48% year over year Over 5000 employees, over 500 employees in Dublin 2016: Best Workplaces in Ireland, Great Place to Work Institute (#2 for large companies) 2016: 10 Best Large Workplaces in Tech, Fortune (#2)
  • #6: provide elastic grid – other services Reliable execution of background tasks or Jobs – pdf printing to payrole Cloudmaster - Agents - Schedule and assign to Agents
  • #7: 5 pools of agents Different types of task, memory size, execution speed
  • #8: 5 data centers Secure Reliable Safe Isolated – fairness Scalable - Efficient
  • #9: This talk is about the lessons I learned migrating a multithreaded java server application to Akka. To support this growth we need to move to stateful services -- Why
  • #10: Actor model of concurrency: Safer (no deadlocks) Easier to reason about Easier to test Better distribution Easier scalability Then Scala because of akka – key selling point
  • #12: Trying to avoid two way relationship (coupling – mutability) Static State should be immutable
  • #13: Trying to avoid two way relationship (coupling – mutability) Static State should be immutable
  • #14: Trying to avoid two way relationship (coupling – mutability) Static State should be immutable
  • #16: Everyone knows about the God class – threading and mutexes make this worse
  • #17: Some are big - Marlon Brando – some are small Robert Downey Junior - me Even when small - entourage
  • #18: AgentPoolActor - Responsible for – Agent actors – Queue of tasks – and their assignments Decomposed into separate classes and traits - Still one actor with an entourage
  • #19: Also drives more bad decisions
  • #21: AgentPoolActor and AgentStateActor External DB changes – sending notifications – message loss – recovery Caused by movie star – Thought problem was stream of events were inconsistent – fix that State Inconsistent – failure – production outage
  • #22: … Beauty of split brains
  • #24: AgentPoolActor takes job from the Queue Assigns it to an Agent Agent might fail and put it back Pool or Agent might own the job - Cannot reliably find the job EG Cancel Job
  • #25: Who - When
  • #27: PoolActor has decided to assign task to an agent Async message to StateActor – PoolActor must ensure agent not reused – before reply What if reply timesout??? Crash - Can I guarantee consistency – what happens to the job?
  • #29: Chaos Marmoset base actor overrides the unhandled method Messages can cause failures or delays
  • #30: Horizontal scalability by pushing all state into the database Actors are about data – Actors are Stateful – Impedance Stateless services cannot update the same data as actor
  • #31: Autonomy – single responsibility If your actors write to the database
  • #34: We want agent assignments to be consistent
  • #35: Banking Transactions ACID? No - Suspense Account – Reconciliation – Compensating transactions Must handle failure cases