-
Notifications
You must be signed in to change notification settings - Fork 578
Attempt to fix flaky util-app test in GitHub CI#310
Attempt to fix flaky util-app test in GitHub CI #310felixbr wants to merge 1 commit intotwitter:develop from
Conversation
Does just increasing the defaultCloseGracePeriod to 30 seconds like you've done do the trick? It seems like a real bug if we cannot ever get the right exit behavior regardless of the grace period.
felixbr
commented
Aug 5, 2022
Well, locally it always works, even with a very very low grace period. My suspicion is that a different timeout (e.g. from Await.ready) somewhere is firing the TimeoutException and so increasing the grace period is not really helping to "fix" CI.
That is purely speculation, of course, as I couldn't reproduce it even once locally.
I've seen similar problems at work where tests would basically be starved in CI because sbt tries to parallelize heavily but the machine has very little CPU and so futures in tests would be created but then paused for very long periods, leading to timeouts.
If you want me to, I can remove the retry part but eventually will only retry for a while and then fail the test anyway. So if it was really broken it wouldn't change anything for the outcome.
bryce-anderson
commented
Aug 11, 2022
If you want me to, I can remove the retry part but
eventuallywill only retry for a while and then fail the test anyway. So if it was really broken it wouldn't change anything for the outcome.
I would prefer to try it. If it's not a problem simply due to "not enough CPU time", which I think you're right that it is, and it's actually flaky for other reasons that is broken and should be fixed and retrying will mask the flakiness/brokenness.
fe57d42 to
344ac48
Compare
felixbr
commented
May 18, 2023
I've rebased this onto develop and reverted the addition of eventually retries. So it's only an increased timeout and a comment now.
1 similar comment
felixbr
commented
May 18, 2023
I've rebased this onto develop and reverted the addition of eventually retries. So it's only an increased timeout and a comment now.
As you can see in many PRs (e.g. #308) there is often one of the different build matrix cases red. As far as I can see it's always the same flaky test case in
AppTest(inutil-app).I tried many things but couldn't reproduce the timeout locally. As an attempt to fight the failures in CI, I'm increasing the timeout for the grace period and also add retries to the test case.
I understand that this is mostly a band aid and not a great solution, so if you don't like it please let me know how we should tackle this instead.
Cheers
~ Felix