-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Disabled the setting reboot.host.and.alert.management.on.heartbeat.timeout by default#10111
Disabled the setting reboot.host.and.alert.management.on.heartbeat.timeout by default #10111slavkap wants to merge 1 commit intoapache:main from
reboot.host.and.alert.management.on.heartbeat.timeout by default #10111Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@ ## 4.19 #10111 +/- ## ============================================ - Coverage 15.13% 15.12% -0.01% + Complexity 11268 11262 -6 ============================================ Files 5408 5408 Lines 473867 473867 Branches 57778 57778 ============================================ - Hits 71700 71684 -16 - Misses 394165 394185 +20 + Partials 8002 7998 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
DaanHoogland
commented
Dec 16, 2024
@slavkap , have you tested this with HA enabled?
weizhouapache
commented
Dec 16, 2024
@slavkap
can you start a discussion on dev/user mailing list ?
this changes the current behaviour.
IMHO, if no objections, we could merge it in 4.21(next major release), but not 4.20/4.19
`reboot.host.and.alert.management.on.heartbeat.timeout` has to be disabled. Even the high availability isn't enabled when there is an issue with a storage CloudStack will reboot the host
79a5f78 to
78180ff
Compare
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.
slavkap
commented
Dec 17, 2024
@DaanHoogland, I've tested this with and without HA
@weizhouapache, sure, I'll start a discussion for this
do-not-reboot-host-on-heartbeat-timeout to not reboot a host on heartbeat timeout (削除ここまで)reboot.host.and.alert.management.on.heartbeat.timeout by default (追記ここまで)
DaanHoogland
commented
Jan 8, 2025
@slavkap , I changed the title . Hope you don't mind. It was a bit confusing to me.
Are you still looking into this?
slavkap
commented
Jan 10, 2025
@DaanHoogland, I don't mind the change, thanks!
Yes, I opened a discussion in the mailing list for this
DaanHoogland
commented
Feb 3, 2025
moved forward
slavkap
commented
Feb 3, 2025
@DaanHoogland, I rebased it on main as @weizhouapache suggested merging it possibly in a major release.
boubouX
commented
Mar 28, 2025
We experienced the unfortunate event of this issue, causing cascading reboots of all our hosts while the NFS server had no running VM. It was an operational nightmare that resulted in approximately 45 minutes of downtime. Changing its default value to false offers us more gain than loss. We adjusted it to our settings; thank you, Wei. This was simply catastrophic!
hanisirfan
commented
Mar 29, 2025
As someone who works with VMware products, I never had an experience where a host reboots when datastore are inaccessible. I believe changing the default for CloudStack to "false" is a great move.
sureshanaparti
commented
Jun 5, 2025
@blueorangutan package
blueorangutan
commented
Jun 5, 2025
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
blueorangutan
commented
Jun 5, 2025
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13621
blueorangutan
commented
Jun 9, 2025
Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 13671
blueorangutan
commented
Jun 9, 2025
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13677
DaanHoogland
commented
Jun 11, 2025
@blueorangutan test
blueorangutan
commented
Jun 11, 2025
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests
@weizhouapache
weizhouapache
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
DaanHoogland
commented
Jun 12, 2025
@sureshanaparti , I think we can merge this one, pending smoke tests. But it merits a note in the release notes page for the next version.
blueorangutan
commented
Jun 12, 2025
|
[SF] Trillian test result (tid-13502)
|
rajujith
commented
Jan 13, 2026
@slavkap Since this is for the 4.22.1 release, could you retarget the PR to the 4.22 branch?
Description
This PR disables the setting
reboot.host.and.alert.management.on.heartbeat.timeout. When there is a storage issue, even if the high availability isn't enabled, CloudStack will reboot the host.Types of changes
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?