Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
A flaky test revealed that the new consumer close may wait for a FindCoordinator unsent request to go out when closing the consumer, even after the commit/leaveGroup stages of close are done.
This could happen because the CoordinatorRequestManager poll continues to attempt FindCoordinator if the coordinator is unknown , even if this happens during consumer close, after the consumer has completed the commit/leave attempts (which are the only steps in close that require a coordinator), and before the network shutdown that stops polling managers here
If the unneeded FindCoordinator is generated and the brokers are down (like could happen in the flaky test), the consumer would wait for that request unnecessarily here https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L327
I expect we shouldn't block the close on a FindCoordinator request if the consumer already completed the commit/leave attempts.
An option could be to consider "signal close" to the CoordinatorRequestManager after the consumer.close completes commit/leave, so that it does not generate any more requests on poll (similar to what is already done for the CommitRequestManager with the CommitOnCloseEvent
This fix should allow to enable this test for the new consumer reliably.
Without the fix, the test is flaky (fails locally after a few repeated runs, fails in CI).
Attachments
Issue Links
- is related to
-
KAFKA-18517 ConsumerBounceTest should run for async consumer
-
- Resolved
-
- links to