Quick Links

Re: Standby server with cascade logical replication could not be properly stopped under load

Lists:	pgsql-bugs

From:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 02:19:56
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Assuming following configuration with three connected servers A->B->C: A
(primary), B (physical standby) and C (logical replica connected to B).
If server A is under load and B is applying incoming WAL records while
also streaming data via logical replication to C, then attempt to stop
server B in 'fast' mode may by unsuccessful. In this case server will
remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
running in an infinite busy loop (consuming a CPU core each). To get
server out of this state it's required either to either stop B using
'immediate' mode or stop server C (which will cause 'walsender'
processes on server B to exit). This issue is reproducible on latest
'master', as well as on current PG 16/17 branches.

Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
This archive contains shell scripts to create
the required environment (all three serves) and then to execute required
steps to get server into incorrect state. First, edit the 'test_env.sh'
file and specify path to PG binaries in PG_PATH variable and optionally
set of ports used by test instances in 'pg_port' array. Then execute the
'test_prepare.sh' script, which will create, configure and start all
three PG instances. Servers then could be started and stopped using
corresponding start and stop scripts. To reproduce the issue, ensure
that all three servers are running and execute the 'test_execute.sh'
script. This script starts 'pgbench' instance in background for 30
seconds to create load on server A, waits for 20 seconds and then try to
stop the B instance using default 'fast' mode. Expected behavior is
normal shutdown for B, while observed behavior is different: shutdown
attempt fails and each remaining 'walsender' process consumes entire CPU
core. To get out of this state one could use 'stop-C.sh' script to stop
the server C, as it will complete shutdown process of B instance as well.

My understanding is that this issue seems to be caused by the logic in
'GetStandbyFlushRecPtr' function, which returns current flush point for
received WAL data. This position is used in 'XLogSendLogical' to
calculate whether current walsender is in 'caught up' state (i.e.
whether we send all the available data to downstream instance). During
shutdown walsenders are allowed to continue their work until they are in
'caught up' state, while 'postmaster' is waiting for their completion.
Currently 'GetStandbyFlushRecPtr' returns position of last stored
record, rather than last applied record. This is correct for physical
replication as we can send data to downstream instance without applying
it to local system. However, for logical replication this seems to be
incorrect, as we could not decode data until it's applied on current
instance. So, if current stored WAL position differs from applied
position while server is being stopped, then
'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
busy loop, waiting for applied position to advance. The recovery process
is already stopped at this moment, so this will be an infinite loop.
Probably either 'GetStandbyFlushRecPtr' or
'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
consideration such case with logical replication.

Attached is also a patch, which aims to fix this issue:
0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
ensure that it returns only applied position for logical replication.
This function could be also invoked from slot synchronization routines
and in this case it retains current behavior by returning last stored
position.

Thanks,
Alexey

Attachment	Content-Type	Size
test_scenario.zip	application/zip	5.1 KB
0001-Use-only-replayed-position-as-target-flush-point-for.patch	text/x-patch	2.9 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 09:48:03
Message-ID:	CAFPTHDb7CY9X4FEwTY0yTRXE3k_xqGu8Q9dXiDkkAfgEMpUHGg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 7:25 PM Alexey Makhmutov
<a(dot)makhmutov(at)postgrespro(dot)ru> wrote:
>
> Assuming following configuration with three connected servers A->B->C: A
> (primary), B (physical standby) and C (logical replica connected to B).
> If server A is under load and B is applying incoming WAL records while
> also streaming data via logical replication to C, then attempt to stop
> server B in 'fast' mode may by unsuccessful. In this case server will
> remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
> running in an infinite busy loop (consuming a CPU core each). To get
> server out of this state it's required either to either stop B using
> 'immediate' mode or stop server C (which will cause 'walsender'
> processes on server B to exit). This issue is reproducible on latest
> 'master', as well as on current PG 16/17 branches.
>
> Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
> This archive contains shell scripts to create
> the required environment (all three serves) and then to execute required
> steps to get server into incorrect state. First, edit the 'test_env.sh'
> file and specify path to PG binaries in PG_PATH variable and optionally
> set of ports used by test instances in 'pg_port' array. Then execute the
> 'test_prepare.sh' script, which will create, configure and start all
> three PG instances. Servers then could be started and stopped using
> corresponding start and stop scripts. To reproduce the issue, ensure
> that all three servers are running and execute the 'test_execute.sh'
> script. This script starts 'pgbench' instance in background for 30
> seconds to create load on server A, waits for 20 seconds and then try to
> stop the B instance using default 'fast' mode. Expected behavior is
> normal shutdown for B, while observed behavior is different: shutdown
> attempt fails and each remaining 'walsender' process consumes entire CPU
> core. To get out of this state one could use 'stop-C.sh' script to stop
> the server C, as it will complete shutdown process of B instance as well.
>
> My understanding is that this issue seems to be caused by the logic in
> 'GetStandbyFlushRecPtr' function, which returns current flush point for
> received WAL data. This position is used in 'XLogSendLogical' to
> calculate whether current walsender is in 'caught up' state (i.e.
> whether we send all the available data to downstream instance). During
> shutdown walsenders are allowed to continue their work until they are in
> 'caught up' state, while 'postmaster' is waiting for their completion.
> Currently 'GetStandbyFlushRecPtr' returns position of last stored
> record, rather than last applied record. This is correct for physical
> replication as we can send data to downstream instance without applying
> it to local system. However, for logical replication this seems to be
> incorrect, as we could not decode data until it's applied on current
> instance. So, if current stored WAL position differs from applied
> position while server is being stopped, then
> 'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
> busy loop, waiting for applied position to advance. The recovery process
> is already stopped at this moment, so this will be an infinite loop.
> Probably either 'GetStandbyFlushRecPtr' or
> 'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
> consideration such case with logical replication.
>
> Attached is also a patch, which aims to fix this issue:
> 0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
> tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
> ensure that it returns only applied position for logical replication.
> This function could be also invoked from slot synchronization routines
> and in this case it retains current behavior by returning last stored
> position.

Good catch, I agree with the analysis. I've also verified that the fix
works as expected.
Just a small comment: can you explicitly mention in the comments that
"for logical replication we can only send records that have already
been replayed else we might get stuck in shutdown" or something to
that effect as that distinction is important for future developers in
this area.

regards,
Ajin Cherian
Fujitsu Australia

From:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 10:33:14
Message-ID:	CAJpy0uA9kw9WLWETPg+upwzBu145vMB7sPZwBuR_vDrbazeuag@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 7:51 AM Alexey Makhmutov
<a(dot)makhmutov(at)postgrespro(dot)ru> wrote:
>
> Assuming following configuration with three connected servers A->B->C: A
> (primary), B (physical standby) and C (logical replica connected to B).
> If server A is under load and B is applying incoming WAL records while
> also streaming data via logical replication to C, then attempt to stop
> server B in 'fast' mode may by unsuccessful. In this case server will
> remain in PM_SHUTDOWN state indefinitely with all 'walsender' processes
> running in an infinite busy loop (consuming a CPU core each). To get
> server out of this state it's required either to either stop B using
> 'immediate' mode or stop server C (which will cause 'walsender'
> processes on server B to exit). This issue is reproducible on latest
> 'master', as well as on current PG 16/17 branches.
>
> Attached is a test scenario to reproduce the issue: 'test_scenario.zip'.
> This archive contains shell scripts to create
> the required environment (all three serves) and then to execute required
> steps to get server into incorrect state. First, edit the 'test_env.sh'
> file and specify path to PG binaries in PG_PATH variable and optionally
> set of ports used by test instances in 'pg_port' array. Then execute the
> 'test_prepare.sh' script, which will create, configure and start all
> three PG instances. Servers then could be started and stopped using
> corresponding start and stop scripts. To reproduce the issue, ensure
> that all three servers are running and execute the 'test_execute.sh'
> script. This script starts 'pgbench' instance in background for 30
> seconds to create load on server A, waits for 20 seconds and then try to
> stop the B instance using default 'fast' mode. Expected behavior is
> normal shutdown for B, while observed behavior is different: shutdown
> attempt fails and each remaining 'walsender' process consumes entire CPU
> core. To get out of this state one could use 'stop-C.sh' script to stop
> the server C, as it will complete shutdown process of B instance as well.
>
> My understanding is that this issue seems to be caused by the logic in
> 'GetStandbyFlushRecPtr' function, which returns current flush point for
> received WAL data. This position is used in 'XLogSendLogical' to
> calculate whether current walsender is in 'caught up' state (i.e.
> whether we send all the available data to downstream instance). During
> shutdown walsenders are allowed to continue their work until they are in
> 'caught up' state, while 'postmaster' is waiting for their completion.
> Currently 'GetStandbyFlushRecPtr' returns position of last stored
> record, rather than last applied record. This is correct for physical
> replication as we can send data to downstream instance without applying
> it to local system. However, for logical replication this seems to be
> incorrect, as we could not decode data until it's applied on current
> instance. So, if current stored WAL position differs from applied
> position while server is being stopped, then
> 'WalSndLoop'/'XLogSendLogical'/'XLogReadRecord' methods will spin in a
> busy loop, waiting for applied position to advance. The recovery process
> is already stopped at this moment, so this will be an infinite loop.
> Probably either 'GetStandbyFlushRecPtr' or
> 'WalSndLoop'/'XLogSendLogical' logic need to be adjusted to take into
> consideration such case with logical replication.
>
> Attached is also a patch, which aims to fix this issue:
> 0001-Use-only-replayed-position-as-target-flush-point-for.patch. It
> tries to to modify behavior of 'GetStandbyFlushRecPtr' function to
> ensure that it returns only applied position for logical replication.
> This function could be also invoked from slot synchronization routines
> and in this case it retains current behavior by returning last stored
> position.
>

The problem stated in 'logical-walsender' on 'physical standby' looks
genuine. I agree with the analysis for slot-sync as well. Slot-sync
does not need the fix as it deals only with flush-position and does
not care about replay-position. Since the problem area falls under
'Allow logical decoding on standbys', I am adding Bertrand for further
comments on this fix.

thanks
Shveta

From:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 13:29:37
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi Ajin and Shveta, thank you for looking at this issue.

On 5/22/25 12:48, Ajin Cherian wrote:

> Just a small comment: can you explicitly mention in the comments that
> "for logical replication we can only send records that have already
> been replayed else we might get stuck in shutdown" or something to
> that effect as that distinction is important for future developers in
> this area.

Thank you. I've expanded the comment to make it more verbose to clarify
expected logic in this case.
Updated patch is attached as
'0001-Use-only-replayed-position-as-target-flush-point-2.patch'.

Thanks,
Alexey

Attachment	Content-Type	Size
0001-Use-only-replayed-position-as-target-flush-point-2.patch	text/x-patch	3.2 KB

From:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
Cc:	shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 15:31:40
Message-ID:	aC9DXHt/[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

On Thu, May 22, 2025 at 04:29:37PM +0300, Alexey Makhmutov wrote:
> Hi Ajin and Shveta, thank you for looking at this issue.
>
> On 5/22/25 12:48, Ajin Cherian wrote:
>
> > Just a small comment: can you explicitly mention in the comments that
> > "for logical replication we can only send records that have already
> > been replayed else we might get stuck in shutdown" or something to
> > that effect as that distinction is important for future developers in
> > this area.
>
> Thank you. I've expanded the comment to make it more verbose to clarify
> expected logic in this case.
> Updated patch is attached as
> '0001-Use-only-replayed-position-as-target-flush-point-2.patch'.

Thanks for the report!

I agree with your analysis and I think that your proposed fix make sense. I
wonder if the comment on top of GetStandbyFlushRecPtr() could be updated a bit
though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 18:22:02
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi Bertrand, thank you for looking at this issue.

On 5/22/25 18:31, Bertrand Drouvot wrote:
> I agree with your analysis and I think that your proposed fix make sense. I
> wonder if the comment on top of GetStandbyFlushRecPtr() could be updated a bit
> though.

I've tried to update method comment as well in the updated patch:
0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch.
The same patch could be applied on top of PG 17 as well.

PG 16 does not have the slot synchronization logic, so its
implementation of GetStandbyFlushRecPtr is slightly different and
comments need to be updated to reflect this discrepancy as well. So,
here is a variant of the same patch updated to be compatible with PG 16:
0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patch

Thanks,
Alexey

Attachment	Content-Type	Size
0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch	text/x-patch	4.0 KB
0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patch	text/x-patch	3.6 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-22 23:47:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 09:22:02PM +0300, Alexey Makhmutov wrote:
> I've tried to update method comment as well in the updated patch:
> 0001-Use-only-replayed-position-as-target-flush-point-3-pg-master.patch.
> The same patch could be applied on top of PG 17 as well.
>
> PG 16 does not have the slot synchronization logic, so its implementation of
> GetStandbyFlushRecPtr is slightly different and comments need to be updated
> to reflect this discrepancy as well. So, here is a variant of the same patch
> updated to be compatible with PG 16:
> 0002-Use-only-replayed-position-as-target-flush-point-3-pg-16.patch

Hmm. The scenario you are pointing at with a mix of cascading logical
and physical standbys is tricky enough that I think this warrants
having a test in the long-run. Do you think that it would be possible
to sort something out with ordered actions taken during the shutdown
sequences at least for HEAD?

Are v15 and older versions things that we should worry about when
using setups made of physical standbys?
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-28 07:35:58
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

+ if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
+ {
+ TimeLineID receiveTLI;
+ XLogRecPtr receivePtr = GetWalRcvFlushRecPtr(NULL, &receiveTLI);
+
+ if (receiveTLI == replayTLI && receivePtr > result)
+ result = receivePtr;
+ }

Looking at the patch, I am not much a fan of looking at MyWalSnd in
GetStandbyFlushRecPtr() which does not have such a dependency yet, to
then decide if the LSN retrieved by GetWalRcvFlushRecPtr() should be
removed or used.

Wouldn't it be better to extend GetStandbyFlushRecPtr() so as the
caller can directly decide if they want to include in the maths the
last LSN flushed by the WAL receiver, depending on the code path where
the routine is called? That makes for a larger patch, of course, but
it removes the need to look at MyWalSnd inside
GetStandbyFlushRecPtr().
--
Michael

From:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-28 16:12:25
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On 5/28/25 10:35, Michael Paquier wrote:
> Wouldn't it be better to extend GetStandbyFlushRecPtr() so as the
> caller can directly decide if they want to include in the maths the
> last LSN flushed by the WAL receiver, depending on the code path where
> the routine is called? That makes for a larger patch, of course, but
> it removes the need to look at MyWalSnd inside
> GetStandbyFlushRecPtr().

Well, this is a great question, because it made me thinking whether we
need to change behavior of THIS function at all. Extending
'GetStandbyFlushRecPtr' to properly support logical replication will
just make it equal to the 'GetXLogReplayRecPtr'. So, basically we just
need to replace invocation of 'GetStandbyFlushRecPtr' with
'GetXLogReplayRecPtr' and this should solve the problem.

The only invoker of 'GetStandbyFlushRecPtr' outside of walsender.c is
slot synchronization routine, which expects to get 'physical' position
anyway, so it does not need to be changed. All other invocations are
performed from the 'walsender.c' file itself. Obviously, both
'StartReplication' (physical) and 'XLogSendPhysical' functions need
current logic, so we can skip them. The 'XLogSendLogical' need to be
changed to use 'GetXLogReplayRecPtr' (to get consistent behavior with
WalSndWaitForWal and actually fix the original problem discussed in this
thread). This leaves us with only single function to consider:
'IdentifySystem'. This function could be called both in case of physical
and logical replication, but I assume that we need to return flushed WAL
position in it as well, as it's described in the documentation for this
command. This also seems to be the right behavior, as primary goal for
'identify' command is to inform client about the potential replication
start point and we definitely can start replication from the flushed
position (as we already has this record), just with some delay (once our
recovery get us to this point). But maybe somebody with more knowledge
of the logical replication expectation could confirm this.

The fix seems to be very simple in this case (assuming the logic above
is correct): we only replace 'GetStandbyFlushRecPtr' call in
'XLogSendLogical' with the 'GetXLogReplayRecPtr' and that's it. Attached
is the modified patch. It could be applied on top of 16, 17 and master
versions.

As for the test: do you want this case with shutdown to be added to the
existing test plan for standby logical recovery? The condition to
trigger the problem is just to have a flushed and replayed positions
pointing to different values during standby shutdown. I could try to add
it to the test suit, but it could take some time.

Thanks,
Alexey

Attachment	Content-Type	Size
0001-Use-only-replayed-position-as-target-flush-point-4-pg-master.patch	text/x-patch	2.3 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>
Cc:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-29 07:01:48
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, May 28, 2025 at 07:12:25PM +0300, Alexey Makhmutov wrote:
> Well, this is a great question, because it made me thinking whether we need
> to change behavior of THIS function at all. Extending
> 'GetStandbyFlushRecPtr' to properly support logical replication will just
> make it equal to the 'GetXLogReplayRecPtr'. So, basically we just need to
> replace invocation of 'GetStandbyFlushRecPtr' with 'GetXLogReplayRecPtr' and
> this should solve the problem.

Exactly. I don't want to assume that all the callers are OK with this
forced changed in the context of a logical WAL sender inside the
function GetStandbyFlushRecPtr(), especially since the change can be
isolated where it only makes sense, aka inside the logical sender
loop.

> This leaves us with only single function
> to consider: 'IdentifySystem'. This function could be called both in case of
> physical and logical replication, but I assume that we need to return
> flushed WAL position in it as well, as it's described in the documentation
> for this command. This also seems to be the right behavior, as primary goal
> for 'identify' command is to inform client about the potential replication
> start point and we definitely can start replication from the flushed
> position (as we already has this record), just with some delay (once our
> recovery get us to this point). But maybe somebody with more knowledge of
> the logical replication expectation could confirm this.

I don't think that there is any need to care about IDENTIFY_SYSTEM.
It would be a short-lived command. Note that we document that the LSN
returned should be the current flush LSN, that could be used as a
starting point for START_REPLICATION. START_REPLICATION has different
properties and assumptions; it's expected to last with its inner logic
required for the shutdown sequence order with WAL senders.

> The fix seems to be very simple in this case (assuming the logic above is
> correct): we only replace 'GetStandbyFlushRecPtr' call in 'XLogSendLogical'
> with the 'GetXLogReplayRecPtr' and that's it. Attached is the modified
> patch. It could be applied on top of 16, 17 and master versions.

Yeah, I think that this is a sensible move. Documenting the reason
why GetXLogReplayRecPtr() is called is important, but the comment you
are suggesting could be better, IMO, say:
"For cascading logical WAL senders, we use the replay LSN rather than
the flush LSN, as decoding would only happen with records already
replayed. This distinction is important especially during the
shutdown sequence, as cascading logical WAL senders can only catch up
with records that have been already replayed, not flushed."

That feels that I'm repeating myself a bit twice if formulated this
way. If you have a better suggestion, feel free..

> As for the test: do you want this case with shutdown to be added to the
> existing test plan for standby logical recovery? The condition to trigger
> the problem is just to have a flushed and replayed positions pointing to
> different values during standby shutdown. I could try to add it to the test
> suit, but it could take some time.

Anything I could think of for the moment would require the release of
an injection point, which is not something we can do during a shutdown
sequence with the existing facility as we cannot connect to the server
while it's in this late in its shutdown sequence: new connections are
prohibited once we expect the WAL senders to catch up. We could go
without that for now to get a fix in, your scripts are enough to
push all the logical WAL senders on the physical standby in an
infinite loop once its shutdown sequence initiates.

Bertrand and others, what do you think?
--
Michael

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-29 10:28:01
Message-ID:	CAFPTHDZu40jysBrqmSYHpumoEDFDs8JoZL3hsN80rFt394Fttw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 5:02 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> nd master versions.
>
> Yeah, I think that this is a sensible move. Documenting the reason
> why GetXLogReplayRecPtr() is called is important, but the comment you
> are suggesting could be better, IMO, say:
> "For cascading logical WAL senders, we use the replay LSN rather than
> the flush LSN, as decoding would only happen with records already
> replayed. This distinction is important especially during the
> shutdown sequence, as cascading logical WAL senders can only catch up
> with records that have been already replayed, not flushed."
>
> That feels that I'm repeating myself a bit twice if formulated this
> way. If you have a better suggestion, feel free..
>
>
> I think this new fix is much better and cleaner. A suggestion for the
comment:
"For cascading logical WAL senders, we use the replay LSN instead of the
flush LSN, since logical decoding on a standby only processes WAL that has
been replayed. This distinction becomes particularly important during
shutdown, as new WAL is no longer replayed and the last replayed LSN marks
the furthest point up to which decoding can proceed."

regards,
Ajin Cherian
Fujitsu Australia

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-05-29 23:32:05
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 08:28:01PM +1000, Ajin Cherian wrote:
> I think this new fix is much better and cleaner. A suggestion for the
> comment:
> "For cascading logical WAL senders, we use the replay LSN instead of the
> flush LSN, since logical decoding on a standby only processes WAL that has
> been replayed. This distinction becomes particularly important during
> shutdown, as new WAL is no longer replayed and the last replayed LSN marks
> the furthest point up to which decoding can proceed."

Yes, that sounds better than my previous suggestion. Thanks.
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Alexey Makhmutov <a(dot)makhmutov(at)postgrespro(dot)ru>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Standby server with cascade logical replication could not be properly stopped under load
Date:	2025-06-02 03:17:18
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, May 30, 2025 at 08:32:05AM +0900, Michael Paquier wrote:
> On Thu, May 29, 2025 at 08:28:01PM +1000, Ajin Cherian wrote:
>> I think this new fix is much better and cleaner. A suggestion for the
>> comment:
>> "For cascading logical WAL senders, we use the replay LSN instead of the
>> flush LSN, since logical decoding on a standby only processes WAL that has
>> been replayed. This distinction becomes particularly important during
>> shutdown, as new WAL is no longer replayed and the last replayed LSN marks
>> the furthest point up to which decoding can proceed."
>
> Yes, that sounds better than my previous suggestion. Thanks.

I have reused that wording, did more tests on v16 and v17, reproducing
both the problems reported and that the change fixes the shutdown of
the physical standby, then applied the change down to v16.

Thanks all.
--
Michael