Problem/Motivation
1) Drupal\Tests\media\FunctionalJavascript\MediaUiJavascriptTest::testMediaTypes
Behat\Mink\Exception\ResponseTextException: The text "The media type pAkYI8C0 has been added." was not found anywhere in the text of the current page.This is from https://www.drupal.org/pift-ci-job/853314
Originally, the patch on #2934424-13: Media has no collection route passed. I do not know why a re-test was triggered, but the re-test failed with the message above. I triggered another re-test, and that one passed.
Proposed resolution
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #27 | 2934997-20.patch | 1.55 KB | benjifisher |
| #24 | 2934997-24.patch | 867 bytes | benjifisher |
| #22 | 2934997-22.patch | 866 bytes | benjifisher |
| #20 | 2934997-20.patch | 1.55 KB | benjifisher |
| #16 | 2934997-16.patch | 1.68 KB | benjifisher |
Comments
Comment #2
benjifisherThe attached patch should run the intermittently failing test 100 times, with 1/4 second pause in between runs.
Comment #3
benjifisherComment #4
benjifisherNow let's try the same test, also applying the patch from #2934424-13: Media has no collection route.
Comment #5
xjmNice work @benjifisher -- looks like #2 caught one! https://www.drupal.org/pift-ci-job/853413
This means that the random fail is already in HEAD, and not introduced by #2934424: Media has no collection route. So we can go ahead with that issue, and instead need to sort out which previous commit introduced this random fail.
Comment #6
xjmSetting back to NW now that we've confirmed the random fail exists in HEAD.
It looks like this might be straightforward to fix. From the test:
That's totally missing any wait condition above the last line, so we need to add that to ensure that the request completes before the assertion runs.
Comment #7
benjifisherWhat is the appropriate way to wait in this sort of test?
Note that the "Save" button submits the form at
/admin/structure/media/add. This is a full page load, not an AJAX event. I considered using$assert_session->assertWaitOnAjaxRequest();until I realized this. Looking for other tests that use$page->pressButton('Save');, I see that followed by$assert->addressMatches('...');inMediaRevisionTest; does that assertion wait until the page loads?Comment #8
benjifisherThis is just a quick test to see if I am going in the right direction.
Comment #9
benjifisherSame fix as in #8, but this will run the test 100x, like the patches in #2 and #4. Time for the testbot to do some work!
Comment #10
benjifisherThis patch is for real. It includes the same changes to MediaUiJavascriptTest as the previous patch, without hacking
run-tests.sh.When the test adds a new media type, and again when it modifies that media type, Drupal loads
/admin/structure/media. Both times, the patch waits for the "Add media type" link and asserts the URL:The only thing that bothers me is that
waitForLink()is normally used for AJAX interactions, but this is a full page load. If there is a better way to implement the wait, I am happy to update the patch.Comment #11
marcoscanoExactly for the reason you pointed out in #10, another developer is likely to come here in the future and just remove them, without even noticing a failure :) How about a comment before these lines, explaining the need and referring to this issue URL?
Note: it surprises me too that we need to wait for the element, on a non-AJAX call, but I have no other suggestion to make, unfortunately. :)
Also, in other tests, after waiting for an AJAX interaction, we assert the new element is really present, with something like:
I'm no test expert, so may have nothing to do, just wanted to mention it for consistency with other tests.
Comment #12
benjifisherBut that makes the patch twice as big (+8 instead of +4). ;)
I added code comments as suggested.
I am also not an expert in testing, either. I notice several places where
addressMatches()is used after pressing a button. Looking through the code and documentation, I do not see any indication that this assertion includes an implicit wait.Should I add
assertNotEmpty()as well?Looking at the line after pressing the 'Delete' button, I notice that there is a method
addressEquals(), so this patch uses that instead ofaddressMatches().This issue has been marked as a critical bug. As such, I think we should limit work here to the lines that are known to cause test failures ... and by that standard, I may already be doing too much. If I get some guidance, I can work on adding more waits in the Media tests as a follow-up issue.
Comment #13
lendudeIt surprises me too, but it's not the first time this is needed, see #2864177: Random failure in FormErrorHandlerCKEditorTest
But it would be nice to find out why and when this is needed.....
This is correct. You need to assert that the element was actually found, otherwise you might just be waiting for 10 seconds, which might help the test pass, but is fairly inefficient. We want to be sure we wait for as short a time as possible.
Comment #14
benjifisherI think this (from the patch in #12) is good enough, but I can change the assertion if you think it is better to be consistent:
Comment #15
lendudeI feel we do need the assertion that
$assert_session->waitForLink('Add media type');found something, because if we change the link text to something else later, we have no indication that this test is running 10 seconds longer then necessary.For reference if you change the waitFor to
$assert_session->waitForLink('Dingoes ate my baby');the test passes just fine now, it just runs a little longer.
So I think we need to future proof this and assert that we wait for the correct link.
Comment #16
benjifisher@Lendude, that sounds like a good reason. Assertion added.
Comment #17
lendude@benjifisher thanks for adding it :)
We have proof of failure in #2, we have proof of passing in #9, nice work!
Comment #18
xjmI queued several more runs of #9; the fail in #2 only showed up 1/6 times, which means we need several times that to be reasonably confident it is indeed fixed. :)
Another good way to test these issues is to come up with a patch that fails 100% of the time (e.g. with a very high sleep at exactly the step where we're waiting for the page load). Not a requirement, but helpful. We can also just wait to see what happens in like 20 runs. (I'm only queuing a few at a time to avoid overloading CI; will revisit later.)
Comment #19
xjmOh, we should never put "See" to the issue in a patch. If people want to know what issue added the line, they can use git blame.
Comment #20
benjifisherAgreed. A high sleep in the test would make the test less likely to fail, so I guess you mean putting a sleep into the submit callback that is being triggered? That should be easy enough, but not until later (maybe tomorrow).
A single failure is useless for statistics. Before knowing how many times we should exercise the patch, we should also run the failing test enough times to have an estimate for how often it fails.
Comment #21
tedbowI tried to make this test fail (without the patch ) by just adding
sleep(1)toindex.php.It didn't fail so I went to 5 and then 10. It then failed but in a different spot:
Assuming this is some kind of ajax for reload where option show up after you select a value for field. Does this have to be done with
assertJsCondition?Could it be done with
\Drupal\FunctionalJavascriptTests\JSWebAssert::waitForElement()? Of course checking for result.but this is what is making this test fail. but the fact that it only fails after waiting 10 seconds and not on the same line makes me wonder if there is not something else going on besides just slow tests.
Comment #22
benjifisherI am not sure what to do with that. :-(
For one thing, we have documented test failures on one line of code, but not the one you found.
That part of the test is exercising
/admin/structure/media/add. Choosing "test" from the "Media source" select list indeed triggers an AJAX event, loading the configuration for the "Test source" media source (from themedia_test_sourcemodule). Then the assertion is looking for one of the elements of that configuration form.Maybe, while I ponder the last line of Comment #21, I will upload a variant of the patch from #2, running this test 10 times, with a 10-sec pause in index.php.
Comment #23
benjifisherComment #24
benjifisherOops, that was only a 1-second pause.
Comment #26
benjifisherOK, my patch in #24 failed at the same place as @tedbow's experiment.
I mulled it over, and I think this is not a problem.
If you put in a 10-second wait, as we did, then the test will fail at the first possible place. That place seems to be the
assertJsCondition()on Line 63.There is only one test method in this class, and this is how it starts:
Correct me if I am wrong on any of this. (I am pretty sure about the middle three points.)
drupalGet()calls will wait patiently for the page to load.fillField()might trigger an AJAX event, but on this page it does not.selectFieldOption()(Line 62) does trigger an AJAX event.When the assertion fails, the test stops running, so we do not see any further failures.
Conclusion: all that is going on is slow tests.
If we want to make this test bullet-proof, then we can insert a wait statement here, and look for other places to do the same. I think that is a good idea, but out of scope for this issue, as I said in #12. Am I right?
I am re-uploading the patch from #20. Maybe tomorrow I will take up @xjm's challenge from #18.
Comment #27
benjifisherComment #28
lendudeHaving a 100% failing test might be nice to have but sounds unlikely to happen. Since it's still unclear why this test fails in the first place. Having to wait for a non-dynamic element on page load is not something that is usually needed. That might be something we want to look into, since it is now needed twice in core (so I assume we will run into this again), but seems out of scope for fixing this critical. I opened up a follow up for this, #2936122: Find out why JavscriptTestBase occasionally needs a waitForElement on a normal page load
Given the 1/600 failure rate in #2, it means the 2200 runs in #9 mean we now only have 0.0455% chance of not having caught the cause but still get 2200 passes. Sounds pretty solid to me.
Bumping back to RTBC to see if people agree :)
Comment #31
catchCommitted 14c499f and pushed to 8.6.x. Thanks!
Will cherry-pick to 8.5.x after the alpha freeze.
Comment #32
catchDone.
Comment #35
xjm