Enable multi-command parsing for replicated clients#3597
Conversation
Multi-command parsing in parseMultibulkBuffer (introduced valkey-io#2092) was disabled for replicated clients because the per-command replication offset relied on c->qb_pos as the right boundary of the just-applied command. Replication stream is actually a big pipeline, so if it can be supported, the processing speed of the replica can be improved. Decouple parsing position from application position by recording, for every parsed command, the qb_pos snapshot taken right after that command finished parsing: - parsedCommand.qb_end_pos: snapshot stored in the queue entry, set by parseMultibulkBuffer when a command is pushed into cmd_queue. - client.qb_applied: snapshot of the command currently being processed, set by the parsers (for c->argv) and by consumeCommandQueue (when popping a queued command). commandProcessed() now uses c->qb_applied instead of c->qb_pos to advance reploff by exactly the bytes of the just-applied command. beforeNextClient shifts qb_end_pos of pending queue entries when the replicated client's querybuf is trimmed, keeping subsequent reploff updates consistent. Both fields are populated unconditionally so that a client transitioning to replicated mid-command (e.g. SYNCSLOTS ESTABLISH installs slot_migration_job inside its own handler) still has a valid value when commandProcessed() runs. Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3597 +/- ##
============================================
- Coverage 76.71% 76.58% -0.14%
============================================
Files 162 162
Lines 80704 80715 +11
============================================
- Hits 61914 61813 -101
- Misses 18790 18902 +112
🚀 New features to boost your workflow:
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughThis PR introduces a new ChangesReplication offset tracking with qb_applied
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/replication.c`:
- Line 4737: This change modifies replication state (server.primary->qb_applied)
in src/replication.c and per repository policy requires an architectural review;
before merging, tag/request `@core-team` review referencing this change to
server.primary->qb_applied in src/replication.c and ensure any architectural
concerns or approval comments are added to the PR thread.
In `@tests/integration/replication-multi-cmd.tcl`:
- Around line 197-198: The comment above the test action is misleading: it says
"Force replica to drop the primary connection" but the command is "$subreplica
client kill type primary" which drops the primary connection of subreplica.
Update the comment to accurately describe that the subreplica (not the main
replica) is being disconnected from its primary, e.g., mention "Force subreplica
to drop its primary connection" so the comment matches the action performed by
the $subreplica client kill type primary command.
- Around line 10-11: Update the two assert_equal calls so they include explicit
failure messages: for the digest check (assert_equal [$primary debug digest]
[$replica debug digest]) add a clear message like "primary and replica digests
differ" and for the dbsize check (assert_equal [$primary dbsize] [$replica
dbsize]) add a message like "primary and replica dbsize differ"; keep the same
left/right expressions but pass the human-readable message as the third argument
to assert_equal so CI failures immediately indicate which convergence check
failed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 04eaf8f6-457c-40ee-aa9b-25423dbf2d25
📒 Files selected for processing (4)
src/networking.csrc/replication.csrc/server.htests/integration/replication-multi-cmd.tcl
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
There was a problem hiding this comment.
Very nice that it was possible to reusing input_bytes! The code is cleaner like this.
Future possible improvements:
-
Maybe we should calculate
input_bytesfrom theqb_posdelta when parsing each command, instead of calculating the sum of bytes. Anyway, that would be a separatate refactoring. -
The test exception https://github.com/valkey-io/valkey/actions/runs/26385690036/job/77663599844?pr=3597#step:9:6246 seems unrelated but we should probably fix it in the future, maybe using
catchand retry, so the test case can handle an extra failover...)
|
@zuiderkwast Thanks for the review, the test failure was handled in #3826 |
|
@enjoy-binbin Have you seen any performance improvement on the replica with this feature? |
|
@zuiderkwast I ran a quick local test where 5 million commands were held in the replication backlog (300MB data); after unpausing and allowing the replica to catch up via PSYNC, i observed that the catch-up speed for the replica increased by approximately 10%. It might not be very rigorous, do you have any cases you'd like to test? |
Enable multi-command parsing for replicated clients
Multi-command parsing in parseMultibulkBuffer (introduced #2092) was
disabled for replicated clients because the per-command replication
offset relied on c->qb_pos as the right boundary of the just-applied
command. Replication stream is actually a big pipeline, so if it can
be supported, the processing speed of the replica can be improved.
Decouple parsing position from application position by introducing a
single per-client field that tracks the currently processed command's
right boundary in querybuf:
querybuf. Set by the parsers (for c->argv, = c->qb_pos) and
advanced incrementally by consumeCommandQueue() using each queued
command's input_bytes (qb_applied += p->input_bytes).
parsedCommand.input_bytes already records the number of querybuf bytes
consumed to parse one command (multibulk header + each bulk-length
line + arg bytes + per-arg CRLFs), see parseMultibulk for mode details.
It is a relative quantity, independent of where querybuf has been trimmed
since parsing, so it is exactly what's needed to advance qb_applied across
multi-command parsing, so no extra per-command snapshot field is required.
commandProcessed() now uses c->qb_applied instead of c->qb_pos to
advance reploff by exactly the bytes of the just-applied command.
beforeNextClient shifts only c->qb_applied (a single variable) when
the replicated client's querybuf is trimmed; pending queue entries
carry only relative input_bytes and therefore need no fix-up.
qb_applied is populated unconditionally on the first-command path so
that a client transitioning to replicated mid-command (e.g.
SYNCSLOTS ESTABLISH installs slot_migration_job inside its own handler)
still has a valid value when commandProcessed() runs.