Skip to content

LBs should avoid calling LBs after lb.shutdown() (1.74.x backport) #12231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2025

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Jul 17, 2025

LoadBalancers shouldn't be called after shutdown(), but RingHashLb could have enqueued work to the SynchronizationContext that executed after shutdown(). This commit fixes problems discovered when auditing all LBs usage of the syncContext for that type of problem.

Similarly, PickFirstLb could have requested a new connection after shutdown(). We want to avoid that sort of thing too.

RingHashLb's test changed from CONNECTING to TRANSIENT_FAILURE to get the latest picker. Because two subchannels have failed it will be in TRANSIENT_FAILURE. Previously the test was using an older picker with out-of-date subchannelView, and the verifyConnection() was too imprecise to notice it was creating the wrong subchannel.

As discovered in b/430347751, where ClusterImplLb was seeing a new subchannel being called after the child LB was shutdown (the shutdown itself had been caused by RingHashConfig not implementing equals() and was fixed by a8de9f0, which caused ClusterResolverLb to replace its state):

java.lang.NullPointerException
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createClusterLocalityFromAttributes(ClusterImplLoadBalancer.java:322)
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createSubchannel(ClusterImplLoadBalancer.java:236)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.internal.PickFirstLeafLoadBalancer.createNewSubchannel(PickFirstLeafLoadBalancer.java:527)
	at io.grpc.internal.PickFirstLeafLoadBalancer.requestConnection(PickFirstLeafLoadBalancer.java:459)
	at io.grpc.internal.PickFirstLeafLoadBalancer.acceptResolvedAddresses(PickFirstLeafLoadBalancer.java:174)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.activate(LazyLoadBalancer.java:64)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.requestConnection(LazyLoadBalancer.java:97)
	at io.grpc.util.ForwardingLoadBalancer.requestConnection(ForwardingLoadBalancer.java:61)
	at io.grpc.xds.RingHashLoadBalancer$RingHashPicker.lambda$pickSubchannel$0(RingHashLoadBalancer.java:440)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:128)
	at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.onData(XdsClientImpl.java:817)

Backport of #12226

LoadBalancers shouldn't be called after shutdown(), but RingHashLb could
have enqueued work to the SynchronizationContext that executed after
shutdown(). This commit fixes problems discovered when auditing all LBs
usage of the syncContext for that type of problem.

Similarly, PickFirstLb could have requested a new connection after
shutdown(). We want to avoid that sort of thing too.

RingHashLb's test changed from CONNECTING to TRANSIENT_FAILURE to get
the latest picker. Because two subchannels have failed it will be in
TRANSIENT_FAILURE. Previously the test was using an older picker with
out-of-date subchannelView, and the verifyConnection() was too imprecise
to notice it was creating the wrong subchannel.

As discovered in b/430347751, where ClusterImplLb was seeing a new
subchannel being called after the child LB was shutdown (the shutdown
itself had been caused by RingHashConfig not implementing equals() and
was fixed by a8de9f0, which caused ClusterResolverLb to replace its
state):

```
java.lang.NullPointerException
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createClusterLocalityFromAttributes(ClusterImplLoadBalancer.java:322)
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createSubchannel(ClusterImplLoadBalancer.java:236)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.internal.PickFirstLeafLoadBalancer.createNewSubchannel(PickFirstLeafLoadBalancer.java:527)
	at io.grpc.internal.PickFirstLeafLoadBalancer.requestConnection(PickFirstLeafLoadBalancer.java:459)
	at io.grpc.internal.PickFirstLeafLoadBalancer.acceptResolvedAddresses(PickFirstLeafLoadBalancer.java:174)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.activate(LazyLoadBalancer.java:64)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.requestConnection(LazyLoadBalancer.java:97)
	at io.grpc.util.ForwardingLoadBalancer.requestConnection(ForwardingLoadBalancer.java:61)
	at io.grpc.xds.RingHashLoadBalancer$RingHashPicker.lambda$pickSubchannel$0(RingHashLoadBalancer.java:440)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:128)
	at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.onData(XdsClientImpl.java:817)
```
@ejona86 ejona86 requested a review from kannanjgithub July 17, 2025 22:00
@ejona86 ejona86 merged commit 1df2a33 into grpc:v1.74.x Jul 18, 2025
15 of 16 checks passed
@ejona86 ejona86 deleted the backport-lb-api-after-shutdown-1.74 branch July 18, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants