Sidekiq scaling:
Workers vs Processes
Who am I?
2
- Software Engineer at Aircall
- Specialized on back-end and infrastructure
- Working in a Ruby environment
Maxence
@Shinomix
About Sidekiq
3
- Asynchronous execution for Ruby or Rails applications
i.e. send emails, compute big CSV files, ...
- Every time you need to execute long/postponable logic, it creates a job (an entry in a
temporary cache) which is executed whenever possible
- Provides retry mechanisms if execution fails
What is it? How does it work?
About Sidekiq
4
- Job: Piece of code that can be scheduled to run in the future
Pieces of vocabulary
- Concurrency: Amount of jobs that can be executed in parallel, by one process
- Process: Instance of a computer program
- Child process: Process created and dependent of another – parent – one
- Thread: Subset of a process that executes one command at a time
About Sidekiq
5
Pieces of vocabulary
Job
Job
Job
Job
Thread
Process
Concurrency
Job Job
About Sidekiq
6
Why are we here tonight?
- How to scale Sidekiq?
- Is there a magic recipe for all Sidekiq usages?
- Can we scale in a cost-sensitive context?
Share with you the researches I did to scale Sidekiq and handle millions of jobs per day
About Sidekiq
7
Enterprise version
- Multi-processes system called sidekiq-swarm, a binary that spawns and
manages several child processes of Sidekiq
- One of the feature of the Enterprise plan. If you can afford it and have needs for
most of the features, use it. Else, find an upstart lover and DIY!
8
Research time!
Research protocol
9
Sample applications
- Ruby MRI 2.5.3
- Rails 5.2 application with Sidekiq 5.2
- No database and ActiveRecord deactivated
- Two jobs for two use cases: Computing (MathJob) and API (ApiJob)
- Regulated infrastructure setup: constant machine size and caching system
Research protocol
10
Sample applications
Research protocol
11
Explored verticals
- Machines count
- Concurrency: one or multiple threads per Sidekiq
- Processes: one or multiple Sidekiq per machine
- Queues: shared between the jobs or dedicated
(Always 8 jobs total running in parallel for each experiment)
12
Experiments
1 – Standard experiment
13
Machines Processes Concurrency Queues
2 1 / machine 4 Shared
1 – Standard experiment
14
API
Average:
10.6s
1 – Standard experiment
15
Computing
Average:
1.9s
1 – Standard experiment
16
Summary
1 – Standard Experiment
Execution time Queuing Cost
API --- + ++
Computing + +
2 – Multiple servers
17
Machines Processes Concurrency Queues
8 1 / machine 1 Shared
2 – Multiple servers
18
API
Average:
6.7s
(-36%)
2 – Multiple servers
19
API – Observations
- Low concurrency has a huge impact on execution time
- Sidekiq shares network resources between the different threads of a process
- On a standard setup, the job executes with 1/Nth
of the speed it could
Vertical scaling of machines (to improve bandwidth) has few impact on execution time
2 – Multiple servers
20
Computing
Average:
1.8s (-5%)
2 – Multiple servers
21
Computing – Observations
- Having dedicated machine for a thread has a small impact on execution time
- Sidekiq has a smart CPU management between the threads that makes
dedicated servers not relevant
Vertical scaling of machines (to increase CPU) will improve execution time
2 – Multiple servers
22
Summary
2 – Multiple servers
Execution
time
Queuing Cost
API +++ ++ --
Computing + -
3 – Multiple processes
23
Machines Processes Concurrency Queues
2 4 / machine 1 Shared
3 – Multiple processes
24
API
Average:
6.8s
(-37%)
3 – Multiple processes
25
API – Observations
- Execution time is as fast as with dedicated machines
- A Sidekiq process only shares network between its threads, it is not an OS
mechanism
- Machine bandwidth is the only limit to execution time, vertical scaling has big
impacts
3 – Multiple processes
26
Computing
Average:
5.8s
(+300%)
3 – Multiple processes
27
Computing – Observations
- Execution time is multiplied by the number of processes on the machine
- The OS divides the CPU resources between the processes running on the
machine
3 – Multiple processes
28
Summary
3 – Multiple processes
Execution
time
Queuing Cost
API +++ + ++
Computing --- +
4 – Multiple servers with dedicated
queues
29
Machines Processes Concurrency Queues
8 1 / machine 1 Dedicated
4 – Multiple servers with dedica...
30
API
Average:
7.0s
(-33%)
4 – Multiple servers with dedica...
31
API – Observations
- Having machines dedicated to one type of job has no impact on execution time
- But it increases the queuing time of ApiJob because there are twice less
machines to process them
4 – Multiple servers with dedica...
32
Computing
Average:
1.8s (-5%)
4 – Multiple servers with dedica...
33
Computing – Observations
- Having machines dedicated to one type of job has no impact on execution time
- Dedicated machines to MathJob decreases the queuing time because they
execute faster
4 – Multiple servers with dedica...
34
Summary
4 – Multiple servers with dedicated queues
Execution
time
Queuing Cost
API +++ - --
Computing + +++
35
Conclusion
Summary
36
1 – Standard Experiment
Execution time Queuing Cost
API --- + ++
Computing + +
2 – Multiple servers
Execution time Queuing Cost
API +++ ++ --
Computing + -
3 – Multiple processes
Execution time Queuing Cost
API +++ + ++
Computing --- +
4 – Multiple servers with dedicated queues
Execution time Queuing Cost
API +++ - --
Computing + +++
Setup for your needs
37
No perfect solution, it depends if you execute API or Computing jobs
- You can mix different configurations to address your use case
If you don't have budget limitations, run many machines with a single process (3/)
- Best "rapport qualité / prix"
Multiple processes are more difficult to fine tune on infrastructure side
- It requires strict machine monitoring to track the load
Going further
38
For API use cases, you can dedicate a process or a machine to specific jobs
- Slow jobs you want to isolate from the system to gain performances
- Critical jobs you want to protect from potential failures and retries on others
39
Thank you!

More Related Content

PPTX
Portable Streaming Pipelines with Apache Beam
PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PPTX
Nashorn: JavaScript that doesn't suck - Tomer Gabel, Wix
PDF
Large-Scale Training with GPUs at Facebook
PDF
Introducing Apache Airflow and how we are using it
PDF
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
PDF
Hkube
PDF
Building Robust Pipelines with Airflow
Portable Streaming Pipelines with Apache Beam
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Nashorn: JavaScript that doesn't suck - Tomer Gabel, Wix
Large-Scale Training with GPUs at Facebook
Introducing Apache Airflow and how we are using it
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Hkube
Building Robust Pipelines with Airflow

What's hot (20)

PDF
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
PDF
KFServing and Kubeflow Pipelines
PDF
StreamSQL Feature Store (Apache Pulsar Summit)
PDF
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
PDF
Infrastructure as Code with Terraform and Ansible
PDF
Airflow for Beginners
PDF
Numba Overview
PPTX
Parallel Programming
PDF
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
PDF
Kubeflow Control Plane 中文
PDF
Kubeflow repos
PDF
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
PDF
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
PPTX
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
PDF
Powering machine learning workflows with Apache Airflow and Python
PDF
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
KEY
Scaling application servers for efficiency
PDF
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
KFServing and Kubeflow Pipelines
StreamSQL Feature Store (Apache Pulsar Summit)
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Infrastructure as Code with Terraform and Ansible
Airflow for Beginners
Numba Overview
Parallel Programming
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kubeflow Control Plane 中文
Kubeflow repos
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
Powering machine learning workflows with Apache Airflow and Python
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Scaling application servers for efficiency
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Ad

Similar to Paris.rb – 07/19 – Sidekiq scaling, workers vs processes (20)

PDF
Work Queue Systems
PDF
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
PPTX
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
PDF
Resilient Design 101 (JeeConf 2017)
PDF
Scalling Rails: The Journey to 200M Notifications
PDF
Infrastructure talk
PDF
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PDF
Resilient design 101 (BuildStuff LT 2017)
ODP
Introduction to Python Celery
PDF
Porting Rails Apps to High Availability Systems
PDF
Designs, Lessons and Advice from Building Large Distributed Systems
PPTX
Job Queues Overview
PDF
Linux capacity planning
PDF
Resilient Design Using Queue Theory
PDF
Capacity Planning for fun & profit
PDF
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
PPTX
uWSGI - Swiss army knife for your Python web apps
PDF
Non-blocking I/O, Event loops and node.js
PDF
Gearman - Northeast PHP 2012
KEY
Cooking a rabbit pie
Work Queue Systems
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Resilient Design 101 (JeeConf 2017)
Scalling Rails: The Journey to 200M Notifications
Infrastructure talk
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
Resilient design 101 (BuildStuff LT 2017)
Introduction to Python Celery
Porting Rails Apps to High Availability Systems
Designs, Lessons and Advice from Building Large Distributed Systems
Job Queues Overview
Linux capacity planning
Resilient Design Using Queue Theory
Capacity Planning for fun & profit
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
uWSGI - Swiss army knife for your Python web apps
Non-blocking I/O, Event loops and node.js
Gearman - Northeast PHP 2012
Cooking a rabbit pie
Ad

Recently uploaded (20)

PDF
AI Guide for Business Growth - Arna Softech
PDF
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Guide to Food Delivery App Development.pdf
PDF
AI-Powered Fuzz Testing: The Future of QA
PPTX
hospital managemt ,san.dckldnklcdnkdnkdnjadnjdjn
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PPTX
Computer Software - Technology and Livelihood Education
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PPTX
Cybersecurity: Protecting the Digital World
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PPTX
Lecture 5 Software Requirement Engineering
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PPTX
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
AI Guide for Business Growth - Arna Softech
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Guide to Food Delivery App Development.pdf
AI-Powered Fuzz Testing: The Future of QA
hospital managemt ,san.dckldnklcdnkdnkdnjadnjdjn
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
Computer Software - Technology and Livelihood Education
Chapter 1 - Transaction Processing and Mgt.pptx
Cybersecurity: Protecting the Digital World
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Lecture 5 Software Requirement Engineering
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025

Paris.rb – 07/19 – Sidekiq scaling, workers vs processes

  • 2. Who am I? 2 - Software Engineer at Aircall - Specialized on back-end and infrastructure - Working in a Ruby environment Maxence @Shinomix
  • 3. About Sidekiq 3 - Asynchronous execution for Ruby or Rails applications i.e. send emails, compute big CSV files, ... - Every time you need to execute long/postponable logic, it creates a job (an entry in a temporary cache) which is executed whenever possible - Provides retry mechanisms if execution fails What is it? How does it work?
  • 4. About Sidekiq 4 - Job: Piece of code that can be scheduled to run in the future Pieces of vocabulary - Concurrency: Amount of jobs that can be executed in parallel, by one process - Process: Instance of a computer program - Child process: Process created and dependent of another – parent – one - Thread: Subset of a process that executes one command at a time
  • 5. About Sidekiq 5 Pieces of vocabulary Job Job Job Job Thread Process Concurrency Job Job
  • 6. About Sidekiq 6 Why are we here tonight? - How to scale Sidekiq? - Is there a magic recipe for all Sidekiq usages? - Can we scale in a cost-sensitive context? Share with you the researches I did to scale Sidekiq and handle millions of jobs per day
  • 7. About Sidekiq 7 Enterprise version - Multi-processes system called sidekiq-swarm, a binary that spawns and manages several child processes of Sidekiq - One of the feature of the Enterprise plan. If you can afford it and have needs for most of the features, use it. Else, find an upstart lover and DIY!
  • 9. Research protocol 9 Sample applications - Ruby MRI 2.5.3 - Rails 5.2 application with Sidekiq 5.2 - No database and ActiveRecord deactivated - Two jobs for two use cases: Computing (MathJob) and API (ApiJob) - Regulated infrastructure setup: constant machine size and caching system
  • 11. Research protocol 11 Explored verticals - Machines count - Concurrency: one or multiple threads per Sidekiq - Processes: one or multiple Sidekiq per machine - Queues: shared between the jobs or dedicated (Always 8 jobs total running in parallel for each experiment)
  • 13. 1 – Standard experiment 13 Machines Processes Concurrency Queues 2 1 / machine 4 Shared
  • 14. 1 – Standard experiment 14 API Average: 10.6s
  • 15. 1 – Standard experiment 15 Computing Average: 1.9s
  • 16. 1 – Standard experiment 16 Summary 1 – Standard Experiment Execution time Queuing Cost API --- + ++ Computing + +
  • 17. 2 – Multiple servers 17 Machines Processes Concurrency Queues 8 1 / machine 1 Shared
  • 18. 2 – Multiple servers 18 API Average: 6.7s (-36%)
  • 19. 2 – Multiple servers 19 API – Observations - Low concurrency has a huge impact on execution time - Sidekiq shares network resources between the different threads of a process - On a standard setup, the job executes with 1/Nth of the speed it could Vertical scaling of machines (to improve bandwidth) has few impact on execution time
  • 20. 2 – Multiple servers 20 Computing Average: 1.8s (-5%)
  • 21. 2 – Multiple servers 21 Computing – Observations - Having dedicated machine for a thread has a small impact on execution time - Sidekiq has a smart CPU management between the threads that makes dedicated servers not relevant Vertical scaling of machines (to increase CPU) will improve execution time
  • 22. 2 – Multiple servers 22 Summary 2 – Multiple servers Execution time Queuing Cost API +++ ++ -- Computing + -
  • 23. 3 – Multiple processes 23 Machines Processes Concurrency Queues 2 4 / machine 1 Shared
  • 25. 3 – Multiple processes 25 API – Observations - Execution time is as fast as with dedicated machines - A Sidekiq process only shares network between its threads, it is not an OS mechanism - Machine bandwidth is the only limit to execution time, vertical scaling has big impacts
  • 27. 3 – Multiple processes 27 Computing – Observations - Execution time is multiplied by the number of processes on the machine - The OS divides the CPU resources between the processes running on the machine
  • 28. 3 – Multiple processes 28 Summary 3 – Multiple processes Execution time Queuing Cost API +++ + ++ Computing --- +
  • 29. 4 – Multiple servers with dedicated queues 29 Machines Processes Concurrency Queues 8 1 / machine 1 Dedicated
  • 30. 4 – Multiple servers with dedica... 30 API Average: 7.0s (-33%)
  • 31. 4 – Multiple servers with dedica... 31 API – Observations - Having machines dedicated to one type of job has no impact on execution time - But it increases the queuing time of ApiJob because there are twice less machines to process them
  • 32. 4 – Multiple servers with dedica... 32 Computing Average: 1.8s (-5%)
  • 33. 4 – Multiple servers with dedica... 33 Computing – Observations - Having machines dedicated to one type of job has no impact on execution time - Dedicated machines to MathJob decreases the queuing time because they execute faster
  • 34. 4 – Multiple servers with dedica... 34 Summary 4 – Multiple servers with dedicated queues Execution time Queuing Cost API +++ - -- Computing + +++
  • 36. Summary 36 1 – Standard Experiment Execution time Queuing Cost API --- + ++ Computing + + 2 – Multiple servers Execution time Queuing Cost API +++ ++ -- Computing + - 3 – Multiple processes Execution time Queuing Cost API +++ + ++ Computing --- + 4 – Multiple servers with dedicated queues Execution time Queuing Cost API +++ - -- Computing + +++
  • 37. Setup for your needs 37 No perfect solution, it depends if you execute API or Computing jobs - You can mix different configurations to address your use case If you don't have budget limitations, run many machines with a single process (3/) - Best "rapport qualité / prix" Multiple processes are more difficult to fine tune on infrastructure side - It requires strict machine monitoring to track the load
  • 38. Going further 38 For API use cases, you can dedicate a process or a machine to specific jobs - Slow jobs you want to isolate from the system to gain performances - Critical jobs you want to protect from potential failures and retries on others