-
Notifications
You must be signed in to change notification settings - Fork 41.5k
Health indicators based on Service Level Objectives #21311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Open questions
We build health indicators with AbstractHealthIndicator(slo.getFailedMessage())
. It's unclear to me if the failed message ever appears in /actuator/health
response body output.
Some of the SLOs are a combination of two or more indicators. For example, in jvmTotalMemory
, we set a relatively low threshold on GC overhead (20% of CPU time over the last 5 minutes) if there is 90% pool utilization as well. These composite SLOs are registered with the relatively new CompositeHealthContributor.fromMap(..)
API. Unfortunately there is no way I can see to provide details and a failed message name on the composite. I'd like to add details and a failed message for each contributing health indicator and potentially a different one for what it means for a set of such indicators to fail together. @philwebb you may have suggestions? An example is included below of what I think might be nice (specifically the details
directly underneath jvmTotalMemory
)?
"jvmTotalMemory": { "status": "UP", "details": { "someTag": "someValue" }, "components": { "jvmGcOverhead": { "status": "UP", "details": { "value": "0.01%", "mustBe": "<20%", "unit": "percent CPU time spent" } }, "jvmMemoryConsumption": { "status": "UP", "details": { "value": "9.09%", "mustBe": "<90%", "unit": "maximum percent used in last 5 minutes" } } } }
220c8ba
to
d907ba5
Compare
Thanks @jkschneider! I'll target this for 2.4.x so we remember to take a look as soon the 2.3.0 release crunch is over.
d907ba5
to
7290f5f
Compare
We haven't had a chance to take a look at this change, nor upgrade to Micrometer 1.6.
We're already quite late in the Milestone cycle and we don't think we'll have time to address this change properly.
We need to take a look at this change and its implications (including the new concepts introduced and the Health endpoint format).
@snicoll and I discussed this today. There are a few things that came up:
- Since we decided that the diskspace health indicator should ideally be something that can be configured in the monitoring system, this feels very much along those lines. If we decide to surface the SLO's as a health indicator, we should align our strategy for diskspace accordingly. Even with the deprecation of the diskspace indicator, we could surface that information in health via the SLOs.
- We are not sure if having a top-level component for every SLO is the best way to do this. Maybe having some sort of nested structure for the SLOs might be a better alternative.
- From an API perspective, we could have an API to expose SLOs which we could use to create the composite rather than the current method which registers beans within a bean method.
Flagging for team-meeting so that we can discuss this on the next team call.
We discussed this some more as a team today and our feeling is that we're not sure that we have a strong enough opinion to auto-configure SLOs has health indicators. We can see that it may make sense for some users but not for others. For example, in some cases, a proxy will already be aware of the error rate for requests that it routes to an application instance. In this case, exposing the information via a health endpoint that it will also be monitoring will be of minimal value, and may even be harmful depending on how things behave when the application's health changes. For users that do want to expose SLOs as health indicators, we could provide some classes that make it easier to do so.
Since this proposal was made, we've also introduced the concept of application state. It may be that some users want to configure things such that an unmet objective results in a change to the application state to indicate that it's no longer ready, for example. We could provide some helper classes that a user can configure to connect SLOs to application state.
We discussed possibly auto-configuring the HealthMeterRegistry
, automatically adding any ServiceLevelObjective
beans to it. We could auto-configure some ServiceLevelObjective
beans such as JvmServiceLevelObjectives.MEMORY
and OperatingSystemServiceLevelObjectives.DISK
rather than hard-coding them as proposed here. This would align with our auto-configuring of Micrometer's various Jvm...Metrics
classes.
Overall, our feeling was that we would stop short of anything that exposes the SLOs externally, instead auto-configuring the HealthMeterRegistry
and supporting beans and making it easier for a user to then plug the SLOs into health or application state in a way that meets their specific needs.
@shakuzen @jonatan-ivanov Could we have your input here please? Are we right to be cautious and just give users the parts they need and leave them to join things together or is there some clearly established usage of HealthMeterRegistry
and SLOs that means that we can proceed with confidence in a particular direction?
This comment has been minimized.
This comment has been minimized.
1ca278f
to
902dd0b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
@ronodhirSoumik
ronodhirSoumik
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to know - use of this file?
Uh oh!
There was an error while loading. Please reload this page.
This feature adds support for commonly requested functionality for an application to be able to aggregate some set of metrics key performance indicators down to a health indicator.
I fully expect some changes, probably significant changes, based on feedback iterations on this, but want to offer this up early in the 2.4.0 release iteration so we have time to iterate and also dogfood any autoconfigured service level objectives.
Some indicators are known to be broadly applicable to a wide range of Java applications, and those could be autoconfigured. An example of a set of such indicators is defined here and autoconfigured by this pull request (
JvmServiceLevelObjectives.MEMORY
).In many cases, users would like to configure a load balancer to avoid instances that are failing a key performance indicator by configuring an HTTP health check on the load balancer. In fact, some applications may already be doing this for the health indicators Spring Boot or users already provide. Example platform load balancer configurations that can be pointed to
/actuator/health
:See micrometer-metrics/micrometer#2055 for more detail.
The
HealthMeterRegistry
As of 1.6.0, Micrometer has a new implementation:
micrometer-registry-health
. An autoconfiguration was added tospring-boot-actuator-autoconfigure
for this new implementation.Any
@Bean ServiceLevelObjective
is configured onto theHealthMeterRegistry
and bound as a Spring BootHealthIndicator
.What it looks like in
/actuator/health
image
About
ServiceLevelObjective
Service level objectives broadly have the following capabilities:
HealthMeterRegistry
.MeterBinder
that contain the measurements that they need to determine availability.Health#details
map, respectively.API error ratio property-driven configuration
The above properties result in two service level objective health indicators called
apiErrorRatioApiCustomer
andapiErrorRatioAdmin
, which check for aSERVER_ERROR
outcome to total throughput ratio of less than 1% for requests to paths starting with/api/customer
and 2% for requests to paths starting with/admin
, respectively.