Skip to content

LoadInterlockAwareMonitor deadlock when clearing cache (multiple databases, test) #45994

@kuahyeow

Description

@kuahyeow

With multiple databases, we observe hangs within system tests.

See also https://gitlab.com/gitlab-org/gitlab/-/issues/371468 (GitLab is at Rails 6.1, Ruby 2.7.5)

Steps to reproduce

Here's a minimal reproduction with Rails 7.0.3.1

  1. Checkout https://gitlab.com/tkuah/multi_database_system_test
  2. bundle install
  3. bundle exec rails db:migrate
  4. bundle exec rails test test/system/home_pages_test.rb

The conditions are:

  1. Multiple databases
  2. System test where there's an async request which overlaps with the main thread

Expected behavior

It does not hang, and the test completes

Actual behavior

It hangs. It looks like it's hanging when clearing query cache (see https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1087203883, and https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1088779499)

A backtrace dump taken while the test is hanging sigdump-68442.log

My suspicion is that:

  1. lock_thread is true for system tests
  2. Therefore both the main thread, and the Puma thread attempt to clear query cache for the same set of connections

So we have two connections:

  1. Puma thread takes a LoadInterlockAwareMonitor lock for primary connection via Post.transaction
  2. Main thread takes a LoadInterlockAwareMonitor lock for animal connection via implicit transaction that create does
  3. Due to 2, main thread attempts to clear query cache for primary connection. Waits for 1's lock
  4. Due to 1, and because of lock_thread, puma thread clears query cache for primary connection. It takes a LoadInterlockAwareMonitor lock for primary connection which succeds
  5. Due to 1, puma thread now attempts clears query cache for animal connection. Waits for 2's lock

See clear_query_caches_for_current_thread method

System configuration

Rails version: 7.0.3.1

Ruby version: ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [arm64-darwin21]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions