Skip to content

Conversation

@flash1293
Copy link
Contributor

@flash1293 flash1293 commented Aug 5, 2025

Fixes #230499

The lock manager runs setupLockManagerIndex via lodash once, so it's not happening on every call. However, if the first call to setupLockManagerIndex errors out (e.g. because Elasticsearch isn't ready yet), then every subsequent call will return the cached rejected promise and fail as well, rendering all lock managers in that node instance broken (since once keeps its state on module scope)

This leads to issues like this (timeout exception thrown from streams, but the call stack originates from the slo plugin setup routine since it's the cached rejected promise):

[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams] TimeoutError: Request timed out
    at KibanaTransport._request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
    at runNextTicks (node:internal/process/task_queues:69:3)
    at listOnTimeout (node:internal/timers:549:9)
    at processTimers (node:internal/timers:523:7)
    at /usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)
    at KibanaTransport.request (/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)
    at Cluster.putComponentTemplate (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)
    at ensureTemplatesAndIndexCreated (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)
    at setupLockManagerIndex (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)
    at LockManager.acquire (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)
    at withLock (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)
    at /usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7

This PR fixes the problem by not using once but instead keeping the state manually only if the promise succeeds, passing errors through.

@flash1293 flash1293 requested a review from a team as a code owner August 5, 2025 09:23
@flash1293 flash1293 added release_note:fix Team:obs-knowledge Observability Experience Knowledge team backport:version Backport to applied version labels v9.2.0 v9.1.1 labels Aug 5, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-knowledge-team (Team:obs-knowledge)

Copy link
Contributor

@SrdjanLL SrdjanLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left a thought on how this new behaviour may impact downstream.

let runSetupIndexAssetOnce = once(setupLockManagerIndex);
export function runSetupIndexAssetEveryTime() {
runSetupIndexAssetOnce = setupLockManagerIndex;
export function rerunSetupIndexAsset() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Observation] Do you think there are any risks of exposing this functionality to downstream dependencies of the lock manager? Guess it was here before, which was going to cache the failures, but now that it's passing through, is this exposing the "reset" functionality downstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't change anything, it's just the new version of runSetIndexAssetEveryTime, but with a new name.

@flash1293 flash1293 merged commit b4f8488 into elastic:main Aug 5, 2025
12 checks passed
@kibanamachine
Copy link
Contributor

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 5, 2025
Fixes elastic#230499

The lock manager runs `setupLockManagerIndex` via lodash `once`, so it's
not happening on every call. However, if the first call to
`setupLockManagerIndex` errors out (e.g. because Elasticsearch isn't
ready yet), then every subsequent call will return the cached rejected
promise and fail as well, rendering all lock managers in that node
instance broken (since `once` keeps its state on module scope)

This leads to issues like this (timeout exception thrown from streams,
but the call stack originates from the slo plugin setup routine since
it's the cached rejected promise):
```
[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams] TimeoutError: Request timed out
    at KibanaTransport._request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
    at runNextTicks (node:internal/process/task_queues:69:3)
    at listOnTimeout (node:internal/timers:549:9)
    at processTimers (node:internal/timers:523:7)
    at /usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)
    at KibanaTransport.request (/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)
    at Cluster.putComponentTemplate (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)
    at ensureTemplatesAndIndexCreated (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)
    at setupLockManagerIndex (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)
    at LockManager.acquire (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)
    at withLock (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)
    at /usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7
```

This PR fixes the problem by not using once but instead keeping the
state manually only if the promise succeeds, passing errors through.

(cherry picked from commit b4f8488)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
9.1

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Aug 5, 2025
# Backport

This will backport the following commits from `main` to `9.1`:
- [Lock manager: Fix setup bug
(#230519)](#230519)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Joe
Reuter","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-08-05T11:41:28Z","message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:obs-knowledge","backport:version","v9.2.0","v9.1.1"],"title":"Lock
manager: Fix setup
bug","number":230519,"url":"https://github.com/elastic/kibana/pull/230519","mergeCommit":{"message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d"}},"sourceBranch":"main","suggestedTargetBranches":["9.1"],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/230519","number":230519,"mergeCommit":{"message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d"}},{"branch":"9.1","label":"v9.1.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Joe Reuter <[email protected]>
delanni pushed a commit to delanni/kibana that referenced this pull request Aug 5, 2025
Fixes elastic#230499

The lock manager runs `setupLockManagerIndex` via lodash `once`, so it's
not happening on every call. However, if the first call to
`setupLockManagerIndex` errors out (e.g. because Elasticsearch isn't
ready yet), then every subsequent call will return the cached rejected
promise and fail as well, rendering all lock managers in that node
instance broken (since `once` keeps its state on module scope)

This leads to issues like this (timeout exception thrown from streams,
but the call stack originates from the slo plugin setup routine since
it's the cached rejected promise):
```
[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams] TimeoutError: Request timed out
    at KibanaTransport._request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
    at runNextTicks (node:internal/process/task_queues:69:3)
    at listOnTimeout (node:internal/timers:549:9)
    at processTimers (node:internal/timers:523:7)
    at /usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)
    at KibanaTransport.request (/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)
    at Cluster.putComponentTemplate (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)
    at ensureTemplatesAndIndexCreated (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)
    at setupLockManagerIndex (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)
    at LockManager.acquire (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)
    at withLock (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)
    at /usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7
```


This PR fixes the problem by not using once but instead keeping the
state manually only if the promise succeeds, passing errors through.
@flash1293
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

flash1293 added a commit to flash1293/kibana that referenced this pull request Aug 6, 2025
Fixes elastic#230499

The lock manager runs `setupLockManagerIndex` via lodash `once`, so it's
not happening on every call. However, if the first call to
`setupLockManagerIndex` errors out (e.g. because Elasticsearch isn't
ready yet), then every subsequent call will return the cached rejected
promise and fail as well, rendering all lock managers in that node
instance broken (since `once` keeps its state on module scope)

This leads to issues like this (timeout exception thrown from streams,
but the call stack originates from the slo plugin setup routine since
it's the cached rejected promise):
```
[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams] TimeoutError: Request timed out
    at KibanaTransport._request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
    at runNextTicks (node:internal/process/task_queues:69:3)
    at listOnTimeout (node:internal/timers:549:9)
    at processTimers (node:internal/timers:523:7)
    at /usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)
    at KibanaTransport.request (/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)
    at Cluster.putComponentTemplate (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)
    at ensureTemplatesAndIndexCreated (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)
    at setupLockManagerIndex (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)
    at LockManager.acquire (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)
    at withLock (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)
    at /usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7
```

This PR fixes the problem by not using once but instead keeping the
state manually only if the promise succeeds, passing errors through.

(cherry picked from commit b4f8488)
flash1293 added a commit that referenced this pull request Aug 6, 2025
# Backport

This will backport the following commits from `main` to `8.19`:
- [Lock manager: Fix setup bug
(#230519)](#230519)

<!--- Backport version: 10.0.1 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Joe
Reuter","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-08-05T11:41:28Z","message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:obs-knowledge","backport:version","v9.2.0","v9.1.1","v8.19.1"],"title":"Lock
manager: Fix setup
bug","number":230519,"url":"https://github.com/elastic/kibana/pull/230519","mergeCommit":{"message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/230519","number":230519,"mergeCommit":{"message":"Lock
manager: Fix setup bug (#230519)\n\nFixes
https://github.com/elastic/kibana/issues/230499\n\nThe lock manager runs
`setupLockManagerIndex` via lodash `once`, so it's\nnot happening on
every call. However, if the first call to\n`setupLockManagerIndex`
errors out (e.g. because Elasticsearch isn't\nready yet), then every
subsequent call will return the cached rejected\npromise and fail as
well, rendering all lock managers in that node\ninstance broken (since
`once` keeps its state on module scope)\n\nThis leads to issues like
this (timeout exception thrown from streams,\nbut the call stack
originates from the slo plugin setup routine since\nit's the cached
rejected
promise):\n```\n[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams]
TimeoutError: Request timed out\n at KibanaTransport._request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)\n
at processTicksAndRejections (node:internal/process/task_queues:105:5)\n
at runNextTicks (node:internal/process/task_queues:69:3)\n at
listOnTimeout (node:internal/timers:549:9)\n at processTimers
(node:internal/timers:523:7)\n at
/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)\n
at KibanaTransport.request
(/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)\n
at Cluster.putComponentTemplate
(/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)\n
at ensureTemplatesAndIndexCreated
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)\n
at setupLockManagerIndex
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)\n
at LockManager.acquire
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)\n
at withLock
(/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)\n
at
/usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7\n```\n\n\nThis
PR fixes the problem by not using once but instead keeping the\nstate
manually only if the promise succeeds, passing errors
through.","sha":"b4f8488f6c8d3758797e2b0efde3c67510b9707d"}},{"branch":"9.1","label":"v9.1.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"url":"https://github.com/elastic/kibana/pull/230549","number":230549,"state":"MERGED","mergeCommit":{"sha":"bca9737ad0d48d378716b23a0d1c84ca866e164b","message":"[9.1]
Lock manager: Fix setup bug (#230519) (#230549)\n\n# Backport\n\nThis
will backport the following commits from `main` to `9.1`:\n- [Lock
manager: Fix setup
bug\n(#230519)](https://github.com/elastic/kibana/pull/230519)\n\n\n\n###
Questions ?\nPlease refer to the [Backport
tool\ndocumentation](https://github.com/sorenlouv/backport)\n\n\n\nCo-authored-by:
Joe Reuter
<[email protected]>"}},{"branch":"8.19","label":"v8.19.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
@wildemat wildemat mentioned this pull request Aug 7, 2025
10 tasks
@mistic mistic added the v8.19.2 label Aug 7, 2025
NicholasPeretti pushed a commit to NicholasPeretti/kibana that referenced this pull request Aug 18, 2025
Fixes elastic#230499

The lock manager runs `setupLockManagerIndex` via lodash `once`, so it's
not happening on every call. However, if the first call to
`setupLockManagerIndex` errors out (e.g. because Elasticsearch isn't
ready yet), then every subsequent call will return the cached rejected
promise and fail as well, rendering all lock managers in that node
instance broken (since `once` keeps its state on module scope)

This leads to issues like this (timeout exception thrown from streams,
but the call stack originates from the slo plugin setup routine since
it's the cached rejected promise):
```
[2025-08-01T18:36:25.080+00:00][ERROR][plugins.streams] TimeoutError: Request timed out
    at KibanaTransport._request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:564:50)
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
    at runNextTicks (node:internal/process/task_queues:69:3)
    at listOnTimeout (node:internal/timers:549:9)
    at processTimers (node:internal/timers:523:7)
    at /usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:631:32
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:627:20)
    at KibanaTransport.request (/usr/share/kibana/node_modules/@kbn/core-elasticsearch-client-server-internal/src/create_transport.js:60:16)
    at Cluster.putComponentTemplate (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/cluster.js:600:16)
    at ensureTemplatesAndIndexCreated (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:56:3)
    at setupLockManagerIndex (/usr/share/kibana/node_modules/@kbn/lock-manager/src/setup_lock_manager_index.js:110:3)
    at LockManager.acquire (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:53:5)
    at withLock (/usr/share/kibana/node_modules/@kbn/lock-manager/src/lock_manager_client.js:242:20)
    at /usr/share/kibana/node_modules/@kbn/slo-plugin/server/plugin.js:176:7
```


This PR fixes the problem by not using once but instead keeping the
state manually only if the promise succeeds, passing errors through.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:fix Team:obs-knowledge Observability Experience Knowledge team v8.19.2 v9.1.2 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lock Manager can enter 'deadlock' state when the initial setup fails

6 participants