Paris.rb – 07/19 – Sidekiq scaling, workers vs processes

Sidekiq scaling:
Workers vs Processes

Who am I?
2
- Software Engineer at Aircall
- Specialized on back-end and infrastructure
- Working in a Ruby environment
Maxence
@Shinomix

About Sidekiq
3
- Asynchronous execution for Ruby or Rails applications
i.e. send emails, compute big CSV files, ...
- Every time you need to execute long/postponable logic, it creates a job (an entry in a
temporary cache) which is executed whenever possible
- Provides retry mechanisms if execution fails
What is it? How does it work?

About Sidekiq
4
- Job: Piece of code that can be scheduled to run in the future
Pieces of vocabulary
- Concurrency: Amount of jobs that can be executed in parallel, by one process
- Process: Instance of a computer program
- Child process: Process created and dependent of another – parent – one
- Thread: Subset of a process that executes one command at a time

About Sidekiq
5
Pieces of vocabulary
Job
Job
Job
Job
Thread
Process
Concurrency
Job Job

About Sidekiq
6
Why are we here tonight?
- How to scale Sidekiq?
- Is there a magic recipe for all Sidekiq usages?
- Can we scale in a cost-sensitive context?
Share with you the researches I did to scale Sidekiq and handle millions of jobs per day

About Sidekiq
7
Enterprise version
- Multi-processes system called sidekiq-swarm, a binary that spawns and
manages several child processes of Sidekiq
- One of the feature of the Enterprise plan. If you can afford it and have needs for
most of the features, use it. Else, find an upstart lover and DIY!

Research protocol
9
Sample applications
- Ruby MRI 2.5.3
- Rails 5.2 application with Sidekiq 5.2
- No database and ActiveRecord deactivated
- Two jobs for two use cases: Computing (MathJob) and API (ApiJob)
- Regulated infrastructure setup: constant machine size and caching system

Research protocol
10
Sample applications

Research protocol
11
Explored verticals
- Machines count
- Concurrency: one or multiple threads per Sidekiq
- Processes: one or multiple Sidekiq per machine
- Queues: shared between the jobs or dedicated
(Always 8 jobs total running in parallel for each experiment)

1 – Standard experiment
13
Machines Processes Concurrency Queues
2 1 / machine 4 Shared

14
API
Average:
10.6s

15
Computing
Average:
1.9s

16
Summary
1 – Standard Experiment
Execution time Queuing Cost
API --- + ++
Computing + +

2 – Multiple servers
17

18
API
Average:
6.7s
(-36%)

19
API – Observations
- Low concurrency has a huge impact on execution time
- Sidekiq shares network resources between the different threads of a process
- On a standard setup, the job executes with 1/Nth
of the speed it could
Vertical scaling of machines (to improve bandwidth) has few impact on execution time

20
Computing
Average:
1.8s (-5%)

21
Computing – Observations
- Having dedicated machine for a thread has a small impact on execution time
- Sidekiq has a smart CPU management between the threads that makes
dedicated servers not relevant
Vertical scaling of machines (to increase CPU) will improve execution time

22
Summary
Execution
time
Queuing Cost
API +++ ++ --
Computing + -

3 – Multiple processes
23

24
API
Average:
6.8s
(-37%)

25
- Execution time is as fast as with dedicated machines
- A Sidekiq process only shares network between its threads, it is not an OS
mechanism
- Machine bandwidth is the only limit to execution time, vertical scaling has big
impacts

26
Computing
Average:
5.8s
(+300%)

27
- Execution time is multiplied by the number of processes on the machine
- The OS divides the CPU resources between the processes running on the
machine

28
Summary
Execution
time
Queuing Cost
API +++ + ++
Computing --- +

4 – Multiple servers with dedicated
queues
29
8 1 / machine 1 Dedicated

4 – Multiple servers with dedica...
30
API
Average:
7.0s
(-33%)

31
- Having machines dedicated to one type of job has no impact on execution time
- But it increases the queuing time of ApiJob because there are twice less
machines to process them

32
Computing
Average:
1.8s (-5%)

33
- Having machines dedicated to one type of job has no impact on execution time
- Dedicated machines to MathJob decreases the queuing time because they
execute faster

34
Summary
4 – Multiple servers with dedicated queues
Execution
time
Queuing Cost
API +++ - --
Computing + +++

Summary
36
1 – Standard Experiment
API --- + ++
Computing + +
API +++ ++ --
Computing + -
API +++ + ++
Computing --- +
4 – Multiple servers with dedicated queues
API +++ - --
Computing + +++

Setup for your needs
37
No perfect solution, it depends if you execute API or Computing jobs
- You can mix different configurations to address your use case
If you don't have budget limitations, run many machines with a single process (3/)
- Best "rapport qualité / prix"
Multiple processes are more difficult to fine tune on infrastructure side
- It requires strict machine monitoring to track the load

Going further
38
For API use cases, you can dedicate a process or a machine to specific jobs
- Slow jobs you want to isolate from the system to gain performances
- Critical jobs you want to protect from potential failures and retries on others

Paris.rb – 07/19 – Sidekiq scaling, workers vs processes

More Related Content

What's hot (20)

Similar to Paris.rb – 07/19 – Sidekiq scaling, workers vs processes (20)

Recently uploaded (20)

Paris.rb – 07/19 – Sidekiq scaling, workers vs processes