Skip to content

Enable multi-command parsing for replicated clients#3597

Merged
enjoy-binbin merged 5 commits into
valkey-io:unstablefrom
enjoy-binbin:replicaed_pipeline
Jun 13, 2026
Merged

Enable multi-command parsing for replicated clients#3597
enjoy-binbin merged 5 commits into
valkey-io:unstablefrom
enjoy-binbin:replicaed_pipeline

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented Apr 30, 2026

Copy link
Copy Markdown
Member

Enable multi-command parsing for replicated clients

Multi-command parsing in parseMultibulkBuffer (introduced #2092) was
disabled for replicated clients because the per-command replication
offset relied on c->qb_pos as the right boundary of the just-applied
command. Replication stream is actually a big pipeline, so if it can
be supported, the processing speed of the replica can be improved.

Decouple parsing position from application position by introducing a
single per-client field that tracks the currently processed command's
right boundary in querybuf:

  • client.qb_applied: right boundary of the current command in
    querybuf. Set by the parsers (for c->argv, = c->qb_pos) and
    advanced incrementally by consumeCommandQueue() using each queued
    command's input_bytes (qb_applied += p->input_bytes).

parsedCommand.input_bytes already records the number of querybuf bytes
consumed to parse one command (multibulk header + each bulk-length
line + arg bytes + per-arg CRLFs), see parseMultibulk for mode details.
It is a relative quantity, independent of where querybuf has been trimmed
since parsing, so it is exactly what's needed to advance qb_applied across
multi-command parsing, so no extra per-command snapshot field is required.

commandProcessed() now uses c->qb_applied instead of c->qb_pos to
advance reploff by exactly the bytes of the just-applied command.
beforeNextClient shifts only c->qb_applied (a single variable) when
the replicated client's querybuf is trimmed; pending queue entries
carry only relative input_bytes and therefore need no fix-up.

qb_applied is populated unconditionally on the first-command path so
that a client transitioning to replicated mid-command (e.g.
SYNCSLOTS ESTABLISH installs slot_migration_job inside its own handler)
still has a valid value when commandProcessed() runs.

Multi-command parsing in parseMultibulkBuffer (introduced valkey-io#2092) was
disabled for replicated clients because the per-command replication
offset relied on c->qb_pos as the right boundary of the just-applied
command. Replication stream is actually a big pipeline, so if it can
be supported, the processing speed of the replica can be improved.

Decouple parsing position from application position by recording, for
every parsed command, the qb_pos snapshot taken right after that
command finished parsing:
  - parsedCommand.qb_end_pos: snapshot stored in the queue entry, set
    by parseMultibulkBuffer when a command is pushed into cmd_queue.
  - client.qb_applied: snapshot of the command currently being
    processed, set by the parsers (for c->argv) and by
    consumeCommandQueue (when popping a queued command).

commandProcessed() now uses c->qb_applied instead of c->qb_pos to
advance reploff by exactly the bytes of the just-applied command.
beforeNextClient shifts qb_end_pos of pending queue entries when the
replicated client's querybuf is trimmed, keeping subsequent reploff
updates consistent.

Both fields are populated unconditionally so that a client transitioning
to replicated mid-command (e.g. SYNCSLOTS ESTABLISH installs slot_migration_job
inside its own handler) still has a valid value when commandProcessed() runs.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@codecov

codecov Bot commented Apr 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.58%. Comparing base (cce8f49) to head (6b2da72).
⚠️ Report is 42 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3597      +/-   ##
============================================
- Coverage     76.71%   76.58%   -0.14%     
============================================
  Files           162      162              
  Lines         80704    80715      +11     
============================================
- Hits          61914    61813     -101     
- Misses        18790    18902     +112     
Files with missing lines Coverage Δ
src/cluster_migrateslots.c 91.99% <100.00%> (-0.15%) ⬇️
src/networking.c 92.11% <100.00%> (-0.17%) ⬇️
src/replication.c 86.12% <100.00%> (+0.04%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/server.h Outdated
Comment thread src/networking.c Outdated
Comment thread src/networking.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for solving this problem!

It looks good. I have a small idea about how we can avoid adding qb_end_pos, not sure if it works. See comment below.

Comment thread src/networking.c Outdated
Comment thread src/server.h Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR introduces a new client->qb_applied field to track the right boundary of commands in the query buffer, fixing replication offset calculations for pipelined multi-command streams. The field is initialized, maintained through buffer resets, populated during parsing completion, and used to advance replication offsets instead of the potentially-lookahead qb_pos.

Changes

Replication offset tracking with qb_applied

Layer / File(s) Summary
Data contract for applied query buffer tracking
src/server.h
New client::qb_applied field tracks the right boundary of the current command in querybuf for replication offset advancement.
Client lifecycle maintenance of qb_applied
src/networking.c
qb_applied is initialized on client creation and reset whenever the shared query buffer is cleared, the client buffer is trimmed, or the multibulk parser rebuilds the buffer for large bulk arguments.
Replication-aware buffer trimming with invariants
src/networking.c
In beforeNextClient(), qb_applied is adjusted alongside repl_applied during replicated trimming, with an invariant assertion protecting against being behind repl_applied.
Parsing completes and records qb_applied
src/networking.c
Both inline and multibulk parsers record qb_applied = qb_pos when parsing completes; multibulk parsing no longer early-returns for replicated clients, enabling pipelined multi-command parsing.
Command consumption advances qb_applied
src/networking.c
In consumeCommandQueue(), qb_applied advances by the queued command's input_bytes.
Replication offset uses qb_applied instead of qb_pos
src/networking.c
In commandProcessed(), replication offset is now advanced using qb_applied rather than qb_pos, asserting forward progress.
Slot import backfill sets qb_applied
src/cluster_migrateslots.c
Backfilled SYNCSLOTS ESTABLISH command sets the slot-import job client's qb_applied to qb_pos so it's treated as applied.
Replication primary caching resets qb_applied
src/replication.c
When caching a disconnected primary for PSYNC, server.primary->qb_applied is reset to 0.
Multi-command pipelined replication tests
tests/integration/replication-multi-cmd.tcl
Comprehensive test covering pipelined large SET traffic, MULTI/EXEC handling, very large bulk arguments, PSYNC reconnects, and three-node chained replication convergence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ranshid
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main objective: enabling multi-command parsing for replicated clients, which is the core change across the modified files.
Description check ✅ Passed The description provides comprehensive context about the change: the problem (parsing disabled for replicated clients), the solution (introducing qb_applied field), and implementation details of how it works.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/replication.c`:
- Line 4737: This change modifies replication state (server.primary->qb_applied)
in src/replication.c and per repository policy requires an architectural review;
before merging, tag/request `@core-team` review referencing this change to
server.primary->qb_applied in src/replication.c and ensure any architectural
concerns or approval comments are added to the PR thread.

In `@tests/integration/replication-multi-cmd.tcl`:
- Around line 197-198: The comment above the test action is misleading: it says
"Force replica to drop the primary connection" but the command is "$subreplica
client kill type primary" which drops the primary connection of subreplica.
Update the comment to accurately describe that the subreplica (not the main
replica) is being disconnected from its primary, e.g., mention "Force subreplica
to drop its primary connection" so the comment matches the action performed by
the $subreplica client kill type primary command.
- Around line 10-11: Update the two assert_equal calls so they include explicit
failure messages: for the digest check (assert_equal [$primary debug digest]
[$replica debug digest]) add a clear message like "primary and replica digests
differ" and for the dbsize check (assert_equal [$primary dbsize] [$replica
dbsize]) add a message like "primary and replica dbsize differ"; keep the same
left/right expressions but pass the human-readable message as the third argument
to assert_equal so CI failures immediately indicate which convergence check
failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 04eaf8f6-457c-40ee-aa9b-25423dbf2d25

📥 Commits

Reviewing files that changed from the base of the PR and between 5b7ac66 and db3dace.

📒 Files selected for processing (4)
  • src/networking.c
  • src/replication.c
  • src/server.h
  • tests/integration/replication-multi-cmd.tcl

Comment thread src/replication.c
Comment thread tests/integration/replication-multi-cmd.tcl
Comment thread tests/integration/replication-multi-cmd.tcl
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label May 25, 2026

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that it was possible to reusing input_bytes! The code is cleaner like this.

Future possible improvements:

@enjoy-binbin

Copy link
Copy Markdown
Member Author

@zuiderkwast Thanks for the review, the test failure was handled in #3826

@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jun 13, 2026
@enjoy-binbin enjoy-binbin merged commit ed6e9a9 into valkey-io:unstable Jun 13, 2026
196 of 197 checks passed
@enjoy-binbin enjoy-binbin deleted the replicaed_pipeline branch June 13, 2026 07:56
@zuiderkwast

Copy link
Copy Markdown
Contributor

@enjoy-binbin Have you seen any performance improvement on the replica with this feature?

@enjoy-binbin

Copy link
Copy Markdown
Member Author

@zuiderkwast I ran a quick local test where 5 million commands were held in the replication backlog (300MB data); after unpausing and allowing the replica to catch up via PSYNC, i observed that the catch-up speed for the replica increased by approximately 10%. It might not be very rigorous, do you have any cases you'd like to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

4 participants