Skip to content

op-supervisor,op-node: Exhaust L1 stalled loop #16284

@sebastianst

Description

@sebastianst

During the interop rehearsal we came across a strange stalled loop between one op-supervisor and op-node (that was driving a reth node) where it just didn't do any derivation progress and was stuck at the same L1 block. We only have info logs:

op-supervisor:

t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120009-1 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-0-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120009 l2=0x3ea79e419b5337c92873c21b867e09f5f50cac541c9f63b65c87c2770339e811:39170 l1=0x6ba8958660cb1fb727db53f17d85b73874337bb9723ee482b6154d0617db72d9:8477038
t=2025-06-04T19:01:19+0000 lvl=debug msg="Next L1 block is not yet available" syncnode=syncnode-420120009-1 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-0-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120009 l1Block=0x6ba8958660cb1fb727db53f17d85b73874337bb9723ee482b6154d0617db72d9:8477038 err="not found"
t=2025-06-04T19:01:19+0000 lvl=info msg="Node completed syncing" syncnode=syncnode-420120010-2 endpoint=RPCSyncSource(ws://op-node.rehearsal-0-bn-1-opn-reth-f-snapsync-2.svc.cluster.local:9645) chain=420120010 l2=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040 l1=0xba0963e26a505b363882a50aaa2692f69ff38c6526b8425d05b0aa3e6f0d2148:8473340

I also included one Node completed syncing request by the other node it was managing, which was making progress just fine.

op-node:

t=2025-06-04T19:00:16+0000 lvl=warn msg="Received signal for L1 block, but needed different block" current=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 next=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=info msg="Exhausted L1 data" mode=managed chainId=420120010 derivedFrom=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 derived=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040
t=2025-06-04T19:00:16+0000 lvl=info msg="Received next L1 block" mode=managed chainId=420120010 nextL1=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=warn msg="Received signal for L1 block, but needed different block" current=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 next=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=info msg="Exhausted L1 data" mode=managed chainId=420120010 derivedFrom=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 derived=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040
t=2025-06-04T19:00:16+0000 lvl=info msg="Received next L1 block" mode=managed chainId=420120010 nextL1=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=warn msg="Received signal for L1 block, but needed different block" current=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 next=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=info msg="Exhausted L1 data" mode=managed chainId=420120010 derivedFrom=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 derived=0x1ae011189eb33430aecdfb56537d8926da93d18f87437a6213f4154ac349bb65:34040
t=2025-06-04T19:00:16+0000 lvl=info msg="Received next L1 block" mode=managed chainId=420120010 nextL1=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341
t=2025-06-04T19:00:16+0000 lvl=warn msg="Received signal for L1 block, but needed different block" current=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341 next=0x449328840338733e94a21b0980513cfd061e1e1a52ee673f3fe864882f27c0ed:8473341

In a loop stuck at the same block.

Image

A restart of the op-node fixed the problem initially, it did a reset and then synced fine again. However, after a few minutes it started becoming very slow again.

Metadata

Metadata

Assignees

Labels

A-op-nodeArea: op-nodeA-op-supervisorArea: op-supervisorH-interopHardfork: change planned for interop upgrade

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions