-
Notifications
You must be signed in to change notification settings - Fork 893
AIO-interface: add Unhealthy container state #5307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@docjyJ
docjyJ
commented
Sep 21, 2024
- Add Unhealty state
- Replace class by backed enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jean-Yves, Simon asked me to have a look at your changes. I left some feedback regarding the code itself.
I really like you refactoring the container state logic from an interface to an enum. However, I think that both concerns could have been split into separate PRs. One PR to refactor the interface and another one to implement the new healthy states. But the code is already there so why not.
Looks good otherwise but did not test.
Hi,
Thanks for the answer, as soon as I have time I'll look into it.
I will try to separate and make several PR.
Take care,
I keep this PR open for the unhealthy state.
See #5372 for enum
1f3f231 to
b9ca83a
Compare
Conflicts :/
Don't worry, i'll handle it.
b9ca83a to
6a3c340
Compare
Solved and up to date and ready (to be tested anyway...)
I have podman on my machine and I can't launch the container... I can't test it and I don't have time to debug...
I have podman on my machine and I can't launch the container...
Why may I ask? Do you run into an issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Looks good to me now!
Health checks may need to be adjusted, probably reduce the interval during startup to speed up startup of all containers.
start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.
start interval is the time between health checks during the start period. This option requires Docker Engine version 25.0 or later.
See: https://docs.docker.com/reference/dockerfile/#healthcheck
This should allow for better dependency management.
Can be used for helm chart, I guess.
Thanks for the idea! However our health checks are currently built in a way that they never fail after a specific time. See for example https://github.com/nextcloud/all-in-one/blob/main/Containers/apache/healthcheck.sh. for the rest the defaults are good enough imho
Thanks for the idea! However our health checks are currently built in a way that they never fail after a specific time. See for example https://github.com/nextcloud/all-in-one/blob/main/Containers/apache/healthcheck.sh. for the rest the defaults are good enough imho
The problem with doing this is that docker considers the container ready... This should immediately take the container out of the starting state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking, see above
This is the only reliable way to check if the container is really up and running.
A fix would be to fail the container if nextcloud:9000 is not reachable and add a startup period (e.g. 10 minutes)
#!/bin/bash - nc -z "$NEXTCLOUD_HOST" 9000 || exit 0 + nc -z "$NEXTCLOUD_HOST" 9000 || exit 0 nc -z 127.0.0.1 8000 || exit 1 nc -z 127.0.0.1 "$APACHE_PORT" || exit 1 if ! nc -z "$NC_DOMAIN" 443; then echo "Could not reach $NC_DOMAIN on port 443." exit 1 fi
- HEALTHCHECK CMD /healthcheck.sh + HEALTHCHECK --start-period=10m CMD /healthcheck.sh
A fix would be to fail the container if nextcloud:9000 is not reachable and add a startup period (e.g. 10 minutes)
This is the problem: we cannot spwcify a time here as is depending on the overall setup like installed apps and features, amount of users, given hardware and else especially during upgrades
This is the problem: we cannot spwcify a time here as is depending on the overall setup like installed apps and features, amount of users, given hardware and else especially during upgrades
Yes, I see...
Would it be better to manage dependencies like docker compose?
The solution for detecting the ready state of a service is to use the condition attribute with one of the following options:
- service_started
- service_healthy. This specifies that a dependency is expected to be "healthy", which is defined with healthcheck, before starting a dependent service.
I fear this is going to have the same problem afaics, no?
@docjyJ LGTM now 😊
Can you please rebase the PR and squash the commits?
Afterwards we can merge this :)
730f4f4 to
9482e43
Compare
Signed-off-by: Jean-Yves <7360784+docjyJ@users.noreply.github.com>
9482e43 to
4798489
Compare
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Running state corresponds to the old GetRunningContainerState state.
Maybe there is another clearer state. Healthy means that the container is running without any problem detected by the AIO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container->GetRunningState does not include Starting and Restarting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid it being hard to understand I went through the isSomething function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true regardless of the Healthy or Starting or Running state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@docjyJ I discussed the latest changes with @st3iny and we agree that the latest changes do not make much sense or at least are not easy enough to merge them. I am very sorry for that.
I would like to ask you if you could restore the changes from b20e2a8 by running git checkout <hash> and then push the old changes to a new branch via so: git checkout -b ench/noid/heathcheck-restored. From that commit hash, I think only such a check like https://github.com/nextcloud/all-in-one/pull/5307/files#diff-502ad3fbc5a2714763795f78cf314d4a76ada0cb3746530f2fb86fa471dd897bR91-R105 is missing. (best in a new commit)
Can you please do the above? I would do it myself but I unfortunately don't have your changes locally available anymore