Skip to content

Restore and send gaps #1721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 13, 2025
Merged

Restore and send gaps #1721

merged 8 commits into from
Jun 13, 2025

Conversation

duhminick
Copy link
Contributor

@duhminick duhminick commented Jun 11, 2025

Description of the issue

With the addition of multi-threaded log sending support, there is a chance that certain ranges of logs were not sent. The state file will have been updated with ranges that were successfully sent. As such, the agent on recovery/startup should attempt to send the missing ranges.

For example given:

a
b
c
d
e

with ranges 0-1,3-4, the missing gap is 2-3. The agent will send the line b first then continue to tail the file at the offset.

Description of changes

  1. If gaps are found, then the given ranges will be "inverted". Meaning that the gaps in the ranges will be constructed. The gaps are then used in tailFileSync().
  2. tailFileSync will first check to see if the inverted ranges are present (i.e. len is greater than 0). It will iterate through the array of ranges
    a. In the range, it will seek to the beginning offset. It will then read and send lines until the curOffset is greater than or equal to the ending offset in the range. Once that is complete, it will move on to the next range.

The logic is largely similar to the existing loop for the tailer minus some error handling and EOF. That is because it's better to just let the main loop handle the errors.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  1. Unit test
  2. Existing integration tests
  3. Tested locally

Example

Given the state file with log.txt configured for log group log.txt

36
/home/ec2-user/log.txt
0-15,25-36

with log.txt having content

a
b
c
d
e
f
... (cont)
x
y
z

Resulted in:
Notice that [a-h], [o-s] are missing
image

Requirements

Before commiting your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

@duhminick duhminick force-pushed the dominic-range-state-pr branch from 10c55dc to 904a1d9 Compare June 11, 2025 17:56
@duhminick duhminick added the ready for testing Indicates this PR is ready for integration tests to run label Jun 12, 2025
@duhminick duhminick marked this pull request as ready for review June 12, 2025 18:18
@duhminick duhminick requested a review from a team as a code owner June 12, 2025 18:18
@@ -201,6 +201,10 @@ func (t *LogFile) FindLogSrc() []logs.LogSrc {
seekFile = &tail.SeekInfo{Whence: io.SeekEnd, Offset: 0}
}

var rangeList state.RangeList
if !restored.OnlyUseMaxOffset() {
Copy link
Contributor

@Paramadon Paramadon Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the restored statei s not emtpy we invert the ranges. You mentioned that if we find the gaps we'll invert it and then construct, but I'm wondering how this finds the gaps exaclty...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func InvertRanges(sorted RangeList) RangeList {
inverted := make([]Range, 0, len(sorted)+1)
var prevEnd uint64
for _, r := range sorted {
if r.start > prevEnd {
inverted = append(inverted, Range{start: prevEnd, end: r.start})
}
if r.end > prevEnd {
prevEnd = r.end
}
}
if prevEnd != unboundedEnd {
inverted = append(inverted, Range{start: prevEnd, end: unboundedEnd})
}
return inverted
}

So inverting the ranges is the same as finding the gaps. I think this unit test might show case a good example:

func TestInvertRanges(t *testing.T) {
tracker := newMultiRangeTracker("test")
assert.True(t, tracker.Insert(Range{start: 5, end: 10}))
assert.True(t, tracker.Insert(Range{start: 20, end: 25}))
ranges := tracker.Ranges()
got := InvertRanges(ranges)
want := RangeList{
{start: 0, end: 5},
{start: 10, end: 20},
{start: 25, end: unboundedEnd},
}
assert.Equal(t, want, got)
tracker.Clear()
want = RangeList{
{start: 0, end: unboundedEnd},
}
assert.Equal(t, want, InvertRanges(tracker.Ranges()))
}

So in the test, the ranges 5-10,20-25 were sent. That means that 0-5,10-20 still need to be sent before we begin the main loop of tailing the log file

Paramadon
Paramadon previously approved these changes Jun 13, 2025
Copy link
Contributor

@Paramadon Paramadon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed pr on call with duhminick@, approved!

jefchien
jefchien previously approved these changes Jun 13, 2025
@duhminick duhminick dismissed stale reviews from jefchien and Paramadon via 9d038c8 June 13, 2025 21:35
@sky333999 sky333999 merged commit 4123ffc into main Jun 13, 2025
11 of 12 checks passed
@sky333999 sky333999 deleted the dominic-range-state-pr branch June 13, 2025 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for testing Indicates this PR is ready for integration tests to run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants