Skip to content

op-node: Handle duplicate outgoing supervisor events #16441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2025

Conversation

sebastianst
Copy link
Member

Description

  • Removes emitting StepReqEvent inside SyncDeriver.onIncomingP2PBlock - this was causing an avalanche of exhaust l1 events
  • Deduplicate sending the same supervisor events from a managed node with a TTL until the same message is resend.
    • Complete duplication is also dangerous, as a supervisor may be restarting and miss state.

Tests

Added unit test of new eventTimestamp type and unit test of event handling behavior of ManagedNode.

Additional context

As long as we haven't fully verified that the node isn't sending a burst of duplicate events in some edge cases, it is safer to deduplicate sending events to the supervisor in this way.

Metadata

Fixes #16284

@sebastianst sebastianst requested review from a team as code owners June 16, 2025 14:11
@sebastianst sebastianst requested a review from protolambda June 16, 2025 14:11
Copy link
Contributor

@axelKingsley axelKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering out loud: would this timestamp TTL also be useful for reducing things like UpdateCrossUnsafeEvent during deep sync operations?

Copy link

codecov bot commented Jun 16, 2025

Codecov Report

Attention: Patch coverage is 75.32468% with 19 lines in your changes missing coverage. Please review.

Project coverage is 45.99%. Comparing base (7adb589) to head (3bf2116).
Report is 22 commits behind head on develop.

Files with missing lines Patch % Lines
op-node/rollup/interop/managed/system.go 69.35% 17 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #16441      +/-   ##
===========================================
- Coverage    46.91%   45.99%   -0.92%     
===========================================
  Files         1394     1340      -54     
  Lines       112800   108561    -4239     
===========================================
- Hits         52915    49938    -2977     
+ Misses       56150    55028    -1122     
+ Partials      3735     3595     -140     
Flag Coverage Δ
cannon-go-tests-64 ?
contracts-bedrock-tests 96.24% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
op-node/rollup/driver/state.go 0.00% <ø> (ø)
op-node/rollup/interop/managed/event_timestamp.go 100.00% <100.00%> (ø)
op-node/rollup/interop/managed/system.go 28.80% <69.35%> (+15.68%) ⬆️

... and 71 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sebastianst
Copy link
Member Author

sebastianst commented Jun 16, 2025

Wondering out loud: would this timestamp TTL also be useful for reducing things like UpdateCrossUnsafeEvent during deep sync operations?

@axelKingsley yes we could totally leverage the new eventTimestamp type and accompanying pattern to deduplicate a few other events hammering the system. Good idea for a follow-up.

edit: although the cleaner fix imo is to make the code more robust and not send duplicates in the first place. I would like for us to switch some events/triggers to be range-based instead of single-block based.

Copy link
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense, although I am still slightly concerned with the L1 traversal itself.

@sebastianst sebastianst force-pushed the seb/dedup-sv-events branch 2 times, most recently from 4ce2563 to b1c859d Compare June 16, 2025 18:45
@sebastianst sebastianst requested a review from a team as a code owner June 16, 2025 18:45
@sebastianst sebastianst requested a review from bitwiseguy June 16, 2025 18:45
@sebastianst
Copy link
Member Author

sebastianst commented Jun 16, 2025

The changes make sense, although I am still slightly concerned with the L1 traversal itself.

My initial implementation had a bug where the derived event was deduplicated based solely on the L1 block ID, which of course doesn't make sense for the derived event, which is also emitted for every new L2 block. This event is now deduplicated based on a combined identifier of the L1 derivedFrom and L2 derived block id. Not sure if this is related to your concerns. Edit: I could actually revert that change. It's enough to deduplicate based on L1 ref for the L1 traversal trigger and L2 ref for the local unsafe trigger.

Copy link
Contributor

@teddyknox teddyknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping so that we can optimistically merge and trigger post-merge CI.

@sebastianst sebastianst force-pushed the seb/dedup-sv-events branch from b1c859d to e7ce3ac Compare June 16, 2025 19:54
@sebastianst sebastianst force-pushed the seb/dedup-sv-events branch from e7ce3ac to 3bf2116 Compare June 16, 2025 20:29
@sebastianst sebastianst added this pull request to the merge queue Jun 17, 2025
Merged via the queue into develop with commit 712a605 Jun 17, 2025
62 checks passed
@sebastianst sebastianst deleted the seb/dedup-sv-events branch June 17, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

op-supervisor,op-node: Exhaust L1 stalled loop
4 participants