Problem/Motivation

@berdir noted that when rendering lots of media entities via paragraphs there can be a lot of calls to usleep(), this is because #3496369: Multiple load path aliases without the preload cache and #1237636: Lazy load multiple entities at a time using fibers actually suspend Fibers now.

The usleep() calls are intended to prevent a situation where we execute a long async query with nothing else to do in the meantime - in that case the loop could run endlessly.

Currently, until we support rendering multiple entities in a view or a reference field, the suspends in aliases and entity load add up as each of them will suspend and resume again, which can quickly result in dozens of suspends on a larger page.

Steps to reproduce

Profile a page that renders a lot of entities with xhprof/blackfire, such as various umami frontpage or a views overview, notice that a large chunk of time is spent on calls to usleep().

Proposed resolution

Fibers come with a communication channel when suspending/resuming. We introduce a new FiberResumeType enum, which optionally allows to signal to the code managing the fibers that they should wait on an async event (database query, http request) or resume immediately (if there is no other fiber that should be processed first).

This concept of using fibers to group multiple loads together is to our knowledge currently fairly unique to Drupal and while it's currently basically the only type of suspensions, we assume that future cases of suspends, possibly also in third party components will be async cases. That's why we default to async and an "immediate" response needs to be communicated explicitly.

With this change, the usleep() calls vanish completely.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-3550724

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch created an issue. See original summary.

catch’s picture

Status: Active » Needs review

Something like this.

berdir’s picture

This only reduces it from 76 to 64 calls to usleep() and even with a 10 instead of 2 it's still 45 calls. I think a large chunk of rendering happens in a single render context, so we essentially suspend the same fiber many times right now if it's for the main content.

Since the scenario with a long-running query doesn't really exist yet, should we remove this for now?

Alternatively, it would be nice if we'd knew in the suspend-cases whether there is just a single fiber or multiple. I have no idea how close the event loop issue is and if it will allow to check that. Should we add our own basic global flag for that, something like FiberHelper::allowBulkSuspend(TRUE/FALSE)?

Or, fibers have a communication channel. Could we pass around some flags, like FiberContext::WAIT or FiberContext::RESUME to guide something like Renderer and whether or not it should immediately resume or wait a bit?

catch’s picture

I have no idea how close the event loop issue is and if it will allow to check that.

It probably would, but the existence of this code in ::executeInRenderContext() is currently blocking it because Revolt has no equivalent (according to @kingdutch).

. Could we pass around some flags, like FiberContext::WAIT or FiberContext::RESUME

This is definitely worth a try.

catch’s picture

Pushed an untested commit to explore communicating to the Fiber to the caller.

catch’s picture

If this works we'd want to implement it in the other fiber loops in core.

We can't mandate a return value for Fiber::suspend(), so it needs to be 100% optional and have a default behaviour when NULL or something other than the enum is returned.

I think the idea of fibers that suspend without kicking off anything async (e.g. the path alias and entity loading issues) is fairly unique to Drupal since it's not actually using anything asynchronous at all, but it's also going to be the most common case by volume even once we've got async http request and database query support in core.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new1.54 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

berdir’s picture

I think you forgot to add the class to the MR, but yes, something like that is what I was thinking. To account for external fibers that we don't control and that aren't likely to implement this wait-for-more pattern I would probably invert it. While by number of suspends, this pattern will be more common, I would expect there won't be that many places in core where we would do that and I think custom suspends are more likely to be the wait-for-async than this. But no strong opinion on this.

catch’s picture

Whoops - pushed the enum (still completely untested).

Inverting the logic is fine and sounds good, didn't give it loads of thought, we need to update the path alias and entity loading code to return the suspend to see an effect, but those are the only two at the moment anyway.

berdir’s picture

Component: base system » render system
Priority: Normal » Major
Status: Needs work » Needs review

Updated this to default to delayed, fixed some stuff and tested a bit on umami. I also get 37 calls to usleep() here:

I get huge variations for the same request (one request was 400ms, the other 500ms), but I see a clear reduction in IO wait:

I/O Wait-19 ms (-41%)
45 ms → 26.4 ms

I think this is at least major, because it currently more than undoes the actual improvements we got from the suspend issues.

berdir’s picture

StatusFileSize
new37.75 KB

On umami, usleep() looks like this:

catch’s picture

@berdir on Umami is it 37 calls to usleep() in HEAD and 0 with the MR?

berdir’s picture

@catch: correct. Both with disabled render caches.

catch’s picture

The only remaining concern I have here is (lack of) compatibility with the Revolt event loop, but this entire render context juggling stuff is as far as we know not compatible with the Revolt event loop, so hopefully we can find a way to factor it away. We won't need this for the other fiber loops because they actually have multiple things to loop over and don't use usleep() etc.

berdir’s picture

Issue summary: View changes
berdir’s picture

Updated the issue summary and created a CR: https://www.drupal.org/node/3556785

I tried to explain that it should be very rare that this needs to be considered.

godotislate’s picture

One minor typo and one question on the MR, otherwise looks good.

Also reviewed the CR and it looks good.

godotislate’s picture

One question initially brought up by @nicxvan on Slack: the enum Delayed case is not used anymore, since the logic was changed to delay by default unless the return value is immediate. Should the Delayed case be kept? If not, does a single case enum make sense?

catch’s picture

I don't have a strong view on #19 but it seems more self-documenting to keep Delayed on there and allow it to be used explicitly if someone really wanted to. If we removed it, and also didn't want to use an enum with one value, we'd probably need to add a regular class constant somewhere, but not sure where to. And it's not impossible we'll find a third behaviour case later on to add to the enum and support.

godotislate’s picture

Should the enum be annotated @internal, then, at least for now? Not that opinionated about it, and otherwise good for RTBC, I think.

godotislate’s picture

There is one failing performance test. Not sure if that's an intermittent failure or if related to the changes here.

godotislate’s picture

Status: Needs review » Reviewed & tested by the community

Re-ran the test and everything is green now. I think this is good for RTBC (whether the enum should be internal isn't a blocker.)

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

Added a comment to the MR that needs addressing.

berdir’s picture

Status: Needs work » Reviewed & tested by the community

Tried to improve the docs, did not add @internal per my comment.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed ee1598d4239 to 11.x and a14ca0ca8a5 to 11.3.x. Thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

  • alexpott committed a14ca0ca on 11.3.x
    fix: #3550724 Reduce usleep() calls in Renderer::executeInRenderContext...

  • alexpott committed ee1598d4 on 11.x
    fix: #3550724 Reduce usleep() calls in Renderer::executeInRenderContext...
quietone’s picture

I don't see a review of the change record on this issue. Can someone confirm it is accurate?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.