Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14048 The Next Generation of the Consumer Rebalance Protocol
  3. KAFKA-16106

group size counters do not reflect the actual sizes when operations fail

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • None
    • None

    Description

      An expire-group-metadata operation generates tombstone records, updates the `groups` state and decrements group size counters, then performs a write to the log. If there is a __consumer_offsets partition reassignment, this operation fails. The `groups` state is reverted to an earlier snapshot but classic group size counters are not. This begins an inconsistency between the metrics and the actual groups size. This applies to all unsuccessful write operations that alter the `groups` state.

       

      The issue is exacerbated because the expire group metadata operation can be retried multiple times until the partition is fully unloaded.

       

      The solution to this is to make the counters also a timeline data structure (TimelineLong) so that in the event of a failed write operation we revert the counters as well.

      Attachments

        Activity

          People

            dongnuolyu Dongnuo Lyu
            jeffkbkim Jeff Kim
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: