-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][offloaders] Automatically evict Offloaded Ledgers from memory #19783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][offloaders] Automatically evict Offloaded Ledgers from memory #19783
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work @eolivelli
@eolivelli We should start with a proposal for a new behavior that will introduced to Pulsar, and it also introduced a new configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have invalidateReadHandle
method to invalidate the read handles. Why should we introduce a time-based invalidation for the offloaded read handles? Or, if the change wants to introduce time-based read handle invalidation, why it can't apply to the bookkeeper read handle?
Since we will start the RC version of
So drag this PR to |
The pr had no activity for 30 days, mark with Stale label. |
@eolivelli Is this something for the near term? |
The pr had no activity for 30 days, mark with Stale label. |
If you are not working on this PR anymore, I will take over this fix tomorrow. |
/pulsarbot rerun-failure-checks |
@poorbarcode Your review comments have been addressed. Please review again. |
...in/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreBackedReadHandleImpl.java
Show resolved
Hide resolved
Dismissing request for changes to be able to run CI. I've replied to the request.
/pulsarbot rerun-failure-checks |
@poorbarcode I'm merging this. You can create follow-up PRs to improve the situation after this has been merged if you find that important. |
…ry (#19783) Co-authored-by: Lari Hotari <[email protected]> (cherry picked from commit a1a2b36)
…ry (#19783) Co-authored-by: Lari Hotari <[email protected]> (cherry picked from commit a1a2b36)
…ry (#19783) Co-authored-by: Lari Hotari <[email protected]> (cherry picked from commit a1a2b36)
…ry (apache#19783) Co-authored-by: Lari Hotari <[email protected]> (cherry picked from commit a1a2b36)
Another related fix to prevent OOM errors is #24655 |
Motivation
ManagedLedgerImpl retains eagerly a cache of all the BookKeeper ReadHandles.
In case of Offloaded ReadHandler there is kind of a memory leak, as each BlobStoreBackedInputStreamImpl retains a DirectBuffer of 1MB, in the case of a topic with terabytes of data and thousands of ledger this leads to Direct OOM errors if something tries to open all the ledgers
Modifications
Add a new background activity that evicts from memory all the Offloaded ReadHandles and release memory.
The feature is controlled by the new configuration option managedLedgerInactiveOffloadedLedgerEvictionTimeSeconds=600
Unfortunately this fix cannot fully prevent a OODM error because there is no global count and limit of the memory retained by such Handles, it allows to mitigate the problem by releasing automatically unused Ledger Handlers.
The default value, 10 minutes, is very conservative, but it should work with real-world ledgers.
The worst case scenario is a topic with tens of thousands of small ledgers with a consumer that reads from the topic from the beginning, in this case the broker will open the ReadHandlers probably more quickly than the eviction process pace.
Verifying this change
This change added tests.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: eolivelli#22