Enable multi-command parsing for replicated clients by enjoy-binbin · Pull Request #3597 · valkey-io/valkey

enjoy-binbin · 2026-04-30T10:29:20Z

Enable multi-command parsing for replicated clients

Multi-command parsing in parseMultibulkBuffer (introduced #2092) was
disabled for replicated clients because the per-command replication
offset relied on c->qb_pos as the right boundary of the just-applied
command. Replication stream is actually a big pipeline, so if it can
be supported, the processing speed of the replica can be improved.

Decouple parsing position from application position by introducing a
single per-client field that tracks the currently processed command's
right boundary in querybuf:

client.qb_applied: right boundary of the current command in
querybuf. Set by the parsers (for c->argv, = c->qb_pos) and
advanced incrementally by consumeCommandQueue() using each queued
command's input_bytes (qb_applied += p->input_bytes).

parsedCommand.input_bytes already records the number of querybuf bytes
consumed to parse one command (multibulk header + each bulk-length
line + arg bytes + per-arg CRLFs), see parseMultibulk for mode details.
It is a relative quantity, independent of where querybuf has been trimmed
since parsing, so it is exactly what's needed to advance qb_applied across
multi-command parsing, so no extra per-command snapshot field is required.

commandProcessed() now uses c->qb_applied instead of c->qb_pos to
advance reploff by exactly the bytes of the just-applied command.
beforeNextClient shifts only c->qb_applied (a single variable) when
the replicated client's querybuf is trimmed; pending queue entries
carry only relative input_bytes and therefore need no fix-up.

qb_applied is populated unconditionally on the first-command path so
that a client transitioning to replicated mid-command (e.g.
SYNCSLOTS ESTABLISH installs slot_migration_job inside its own handler)
still has a valid value when commandProcessed() runs.

Multi-command parsing in parseMultibulkBuffer (introduced valkey-io#2092) was disabled for replicated clients because the per-command replication offset relied on c->qb_pos as the right boundary of the just-applied command. Replication stream is actually a big pipeline, so if it can be supported, the processing speed of the replica can be improved. Decouple parsing position from application position by recording, for every parsed command, the qb_pos snapshot taken right after that command finished parsing: - parsedCommand.qb_end_pos: snapshot stored in the queue entry, set by parseMultibulkBuffer when a command is pushed into cmd_queue. - client.qb_applied: snapshot of the command currently being processed, set by the parsers (for c->argv) and by consumeCommandQueue (when popping a queued command). commandProcessed() now uses c->qb_applied instead of c->qb_pos to advance reploff by exactly the bytes of the just-applied command. beforeNextClient shifts qb_end_pos of pending queue entries when the replicated client's querybuf is trimmed, keeping subsequent reploff updates consistent. Both fields are populated unconditionally so that a client transitioning to replicated mid-command (e.g. SYNCSLOTS ESTABLISH installs slot_migration_job inside its own handler) still has a valid value when commandProcessed() runs. Signed-off-by: Binbin <binloveplay1314@qq.com>

codecov · 2026-04-30T10:54:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.58%. Comparing base (cce8f49) to head (6b2da72).
⚠️ Report is 42 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3597      +/-   ##
============================================
- Coverage     76.71%   76.58%   -0.14%     
============================================
  Files           162      162              
  Lines         80704    80715      +11     
============================================
- Hits          61914    61813     -101     
- Misses        18790    18902     +112

Files with missing lines	Coverage Δ
src/cluster_migrateslots.c	`91.99% <100.00%> (-0.15%)`	⬇️
src/networking.c	`92.11% <100.00%> (-0.17%)`	⬇️
src/replication.c	`86.12% <100.00%> (+0.04%)`	⬆️
src/server.h	`100.00% <ø> (ø)`

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast

Thanks for solving this problem!

It looks good. I have a small idea about how we can avoid adding qb_end_pos, not sure if it works. See comment below.

Signed-off-by: Binbin <binloveplay1314@qq.com>

coderabbitai · 2026-05-25T03:52:08Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR introduces a new client->qb_applied field to track the right boundary of commands in the query buffer, fixing replication offset calculations for pipelined multi-command streams. The field is initialized, maintained through buffer resets, populated during parsing completion, and used to advance replication offsets instead of the potentially-lookahead qb_pos.

Changes

Replication offset tracking with qb_applied

Layer / File(s)	Summary
Data contract for applied query buffer tracking `src/server.h`	New `client::qb_applied` field tracks the right boundary of the current command in `querybuf` for replication offset advancement.
Client lifecycle maintenance of qb_applied `src/networking.c`	`qb_applied` is initialized on client creation and reset whenever the shared query buffer is cleared, the client buffer is trimmed, or the multibulk parser rebuilds the buffer for large bulk arguments.
Replication-aware buffer trimming with invariants `src/networking.c`	In `beforeNextClient()`, `qb_applied` is adjusted alongside `repl_applied` during replicated trimming, with an invariant assertion protecting against being behind `repl_applied`.
Parsing completes and records qb_applied `src/networking.c`	Both inline and multibulk parsers record `qb_applied = qb_pos` when parsing completes; multibulk parsing no longer early-returns for replicated clients, enabling pipelined multi-command parsing.
Command consumption advances qb_applied `src/networking.c`	In `consumeCommandQueue()`, `qb_applied` advances by the queued command's `input_bytes`.
Replication offset uses qb_applied instead of qb_pos `src/networking.c`	In `commandProcessed()`, replication offset is now advanced using `qb_applied` rather than `qb_pos`, asserting forward progress.
Slot import backfill sets qb_applied `src/cluster_migrateslots.c`	Backfilled `SYNCSLOTS ESTABLISH` command sets the slot-import job client's `qb_applied` to `qb_pos` so it's treated as applied.
Replication primary caching resets qb_applied `src/replication.c`	When caching a disconnected primary for PSYNC, `server.primary->qb_applied` is reset to `0`.
Multi-command pipelined replication tests `tests/integration/replication-multi-cmd.tcl`	Comprehensive test covering pipelined large SET traffic, MULTI/EXEC handling, very large bulk arguments, PSYNC reconnects, and three-node chained replication convergence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

ranshid

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and accurately summarizes the main objective: enabling multi-command parsing for replicated clients, which is the core change across the modified files.
Description check	✅ Passed	The description provides comprehensive context about the change: the problem (parsing disabled for replicated clients), the solution (introducing qb_applied field), and implementation details of how it works.
Docstring Coverage	✅ Passed	Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/replication.c`:
- Line 4737: This change modifies replication state (server.primary->qb_applied)
in src/replication.c and per repository policy requires an architectural review;
before merging, tag/request `@core-team` review referencing this change to
server.primary->qb_applied in src/replication.c and ensure any architectural
concerns or approval comments are added to the PR thread.

In `@tests/integration/replication-multi-cmd.tcl`:
- Around line 197-198: The comment above the test action is misleading: it says
"Force replica to drop the primary connection" but the command is "$subreplica
client kill type primary" which drops the primary connection of subreplica.
Update the comment to accurately describe that the subreplica (not the main
replica) is being disconnected from its primary, e.g., mention "Force subreplica
to drop its primary connection" so the comment matches the action performed by
the $subreplica client kill type primary command.
- Around line 10-11: Update the two assert_equal calls so they include explicit
failure messages: for the digest check (assert_equal [$primary debug digest]
[$replica debug digest]) add a clear message like "primary and replica digests
differ" and for the dbsize check (assert_equal [$primary dbsize] [$replica
dbsize]) add a message like "primary and replica dbsize differ"; keep the same
left/right expressions but pass the human-readable message as the third argument
to assert_equal so CI failures immediately indicate which convergence check
failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 04eaf8f6-457c-40ee-aa9b-25423dbf2d25

📥 Commits

Reviewing files that changed from the base of the PR and between 5b7ac66 and db3dace.

📒 Files selected for processing (4)

src/networking.c
src/replication.c
src/server.h
tests/integration/replication-multi-cmd.tcl

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast

Very nice that it was possible to reusing input_bytes! The code is cleaner like this.

Future possible improvements:

Maybe we should calculate input_bytes from the qb_pos delta when parsing each command, instead of calculating the sum of bytes. Anyway, that would be a separatate refactoring.
The test exception https://github.com/valkey-io/valkey/actions/runs/26385690036/job/77663599844?pr=3597#step:9:6246 seems unrelated but we should probably fix it in the future, maybe using catch and retry, so the test case can handle an extra failover...)

enjoy-binbin · 2026-05-25T10:07:01Z

@zuiderkwast Thanks for the review, the test failure was handled in #3826

zuiderkwast · 2026-06-15T18:32:22Z

@enjoy-binbin Have you seen any performance improvement on the replica with this feature?

enjoy-binbin · 2026-06-17T08:19:41Z

@zuiderkwast I ran a quick local test where 5 million commands were held in the replication backlog (300MB data); after unpausing and allowing the replica to catch up via PSYNC, i observed that the catch-up speed for the replica increased by approximately 10%. It might not be very rigorous, do you have any cases you'd like to test?

enjoy-binbin requested a review from zuiderkwast April 30, 2026 10:29

github-actions Bot assigned enjoy-binbin Apr 30, 2026

jjuleslasarte reviewed Apr 30, 2026

View reviewed changes

Comment thread src/server.h Outdated

Comment thread src/networking.c Outdated

Comment thread src/networking.c Outdated

code review

a6364de

Signed-off-by: Binbin <binloveplay1314@qq.com>

jjuleslasarte approved these changes May 1, 2026

View reviewed changes

sarthakaggarwal97 mentioned this pull request May 3, 2026

[Shadow] Enable multi-command parsing for replicated clients sarthakaggarwal97/valkey#139

Closed

enjoy-binbin added this to Valkey 10 May 9, 2026

madolson moved this to Needs Review in Valkey 10 May 10, 2026

avifenesh mentioned this pull request May 10, 2026

[dogfood] Enable multi-command parsing for replicated clients (mirrors upstream #3597) avifenesh/valkey#7

Closed

zuiderkwast approved these changes May 22, 2026

View reviewed changes

Comment thread src/networking.c Outdated

Comment thread src/server.h Outdated

code review: use input_bytes

db3dace

Signed-off-by: Binbin <binloveplay1314@qq.com>

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/replication.c

Comment thread tests/integration/replication-multi-cmd.tcl

Comment thread tests/integration/replication-multi-cmd.tcl

enjoy-binbin added 2 commits May 25, 2026 12:02

Merge remote-tracking branch 'upstream/unstable' into replicaed_pipeline

7909b4d

Signed-off-by: Binbin <binloveplay1314@qq.com>

Handle createSlotImportJob

6b2da72

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label May 25, 2026

zuiderkwast approved these changes May 25, 2026

View reviewed changes

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jun 13, 2026

enjoy-binbin merged commit ed6e9a9 into valkey-io:unstable Jun 13, 2026
196 of 197 checks passed

enjoy-binbin deleted the replicaed_pipeline branch June 13, 2026 07:56

Conversation

enjoy-binbin commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin commented May 25, 2026

Uh oh!

Uh oh!

zuiderkwast commented Jun 15, 2026

Uh oh!

enjoy-binbin commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

enjoy-binbin commented Apr 30, 2026 •

edited

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading

zuiderkwast left a comment •

edited

Loading