-
Notifications
You must be signed in to change notification settings - Fork 138
Adjust nginx agent backoff settings and revert request timeout #3820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.99%. Comparing base (13d2288
) to head (7c4d029
).
Additional details and impacted files
@@ Coverage Diff @@ ## main #3820 +/- ## ======================================= Coverage 86.99% 86.99% ======================================= Files 128 128 Lines 16412 16412 Branches 62 62 ======================================= Hits 14278 14278 - Misses 1958 1959 +1 + Partials 176 175 -1
☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
c5ec91f
to
da8852d
Compare
Regarding these settings:
initial_interval: .5s
max_interval: 1.5s
max_elapsed_time: 3s
randomization_factor: 0.5
multiplier: 1.5
You can read about them here: https://pkg.go.dev/github.com/cenkalti/backoff/v4#section-readme.
But from my understanding:
initial_interval
is the first sleep duration before the first retry
max_interval
is the maximum sleep duration between reties
max_elapsed_time
is the maximum time its going to wait before getting out of the backoff wait
randomization_factor
adds a jitter so each sleep isn't the same.
multiplier
is how the "base" interval grows. Basically how much the sleep interval is increasing.
Also all the settings are set to their default except for max_interval and max_elapsed_time. I wanted the maximum wait time to be around 2-3 seconds, since most nginx workers should restart in < 1 seconds, meaning only the long-lived connections would be over 2-3 seconds.
This wait is used here: https://github.com/nginx/agent/blob/main/internal/resource/nginx_instance_operator.go#L144
Problem: As a result to NGINX Agent adding a wait for nginx workers to reload before updating nginx configuration, any long-lived connections would result in an excessively long wait between updating nginx configurations. This cause delays in NGF and on NGINX Plus, could cause a dropping of traffic if there was a wait between updating the configuration and updating the NGINX Plus upstreams. Solution: Adjust the NGINX Agent reload configuration to lower the maximum wait time. Reverted the test request timeout duration back to 10 seconds. Testing: Ran the SnippetsFilter functional test a couple of times and verified when there was a long-lived connection blocking on of the test cases, it went from a 27 second timeout -> 3 second. Also deployed nginx and verified the agent configuration was in there.
#3828) Cherry-pick #3820 Problem: As a result to NGINX Agent adding a wait for nginx workers to reload before updating nginx configuration, any long-lived connections would result in an excessively long wait between updating nginx configurations. This cause delays in NGF and on NGINX Plus, could cause a dropping of traffic if there was a wait between updating the configuration and updating the NGINX Plus upstreams. Solution: Adjust the NGINX Agent reload configuration to lower the maximum wait time. Reverted the test request timeout duration back to 10 seconds. Testing: Ran the SnippetsFilter functional test a couple of times and verified when there was a long-lived connection blocking on of the test cases, it went from a 27 second timeout -> 3 second. Also deployed nginx and verified the agent configuration was in there.
Uh oh!
There was an error while loading. Please reload this page.
Proposed changes
Problem: As a result to NGINX Agent adding a wait for nginx workers to reload before updating nginx configuration, any long-lived connections would result in an excessively long wait between updating nginx configurations. This cause delays in NGF and on NGINX Plus, could cause a dropping of traffic if there was a wait between updating the configuration and updating the NGINX Plus upstreams.
Solution: Adjust the NGINX Agent reload configuration to lower the maximum wait time. Reverted the test request timeout duration back to 10 seconds.
Testing: Ran the SnippetsFilter functional test a couple of times and verified when there was a long-lived connection blocking on of the test cases, it went from a 27 second timeout -> 3 second. Also deployed nginx and verified the agent configuration was in there.
Closes #3801
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.