[KAFKA-16106] group size counters do not reflect the actual sizes when operations fail - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0
Component/s: None
Labels:
None

Description

An expire-group-metadata operation generates tombstone records, updates the `groups` state and decrements group size counters, then performs a write to the log. If there is a __consumer_offsets partition reassignment, this operation fails. The `groups` state is reverted to an earlier snapshot but classic group size counters are not. This begins an inconsistency between the metrics and the actual groups size. This applies to all unsuccessful write operations that alter the `groups` state.

The issue is exacerbated because the expire group metadata operation can be retried multiple times until the partition is fully unloaded.

The solution to this is to make the counters also a timeline data structure (TimelineLong) so that in the event of a failed write operation we revert the counters as well.

Attachments

Issue Links

links to

GitHub Pull Request #16511

GitHub Pull Request #16874

Activity

People

Assignee:: Dongnuo Lyu

Reporter:: Jeff Kim

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/Jan/24 19:53

Updated:: 04/Oct/24 07:31

Resolved:: 04/Oct/24 07:31