-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[RFC] Remove/hook unused metrics #2901 #5182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
block throughput metrics on m8g.metal instances for test scenarios using the async fio engine and more than 1 vcpu are volatile, so exclude them from A/B-testing. Suggested-by: Riccardo Mancini <mancio@amazon.com> Signed-off-by: Patrick Roy <roypat@amazon.co.uk> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
9990ff7
to
2cc23e7
Compare
This is great, thanks for tackling this! I think we can probably even include your script in this PR (somewhere under tools
) and then run it as part of our regular test suite, so we don't again reintroduce unneeded metrics later on :)
Btw, I think you can catch a few more unused ones with a tiny tweak to your script: Add metrics.rs
files to the check that excludes specific use-sites. For example, the rx_partial_writes
metric in net/metrics.rs is unused, but there's a .add
call due to metrics aggregation inside the metrics files itself that makes your script believe it is in use.
script finds metric files, extracts fields, and reports validity of each metrics Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
78eac72
to
d8bfb5a
Compare
Great! Thank you for all the feedback, and I'm super stoked to potentially have my scripted integrated within the test suite!!
I went ahead and made the change to the script as you recommended which you can see above or here. Additionally, the following metrics may or may not be removed:
- metrics_count (SharedIncMetric)
- metrics_fails (SharedIncMetric)
- missed_metrics_count (SharedIncMetric)
I opted to ask you since removing them would require reworking src/metrics.rs and requests/metrics.rs; if you deem that is okay I will go ahead and make those changes as well!
Thanks!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@ ## main #5182 +/- ## ========================================== + Coverage 82.88% 82.92% +0.04% ========================================== Files 250 250 Lines 26936 26927 -9 ========================================== + Hits 22325 22330 +5 + Misses 4611 4597 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Ahh, I didn't realize we had files called metrics.rs that dont actually contain definition of metrics structs. How inconvenient haha. Those 3 metrics will need to stick around.
In this case, we can change
or "metrics.rs" in path
to
or ("metrics.rs" in path and "vmm" in path)
since all metrics definitions are in the "vmm" crate, and requests/metrics.rs is in the "firecracker" crate, and then it won't get these false-positives anymore :)
btw, I think your last push maybe combined changes from multiple branches into this PR? :D
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
cdfbef8
to
02817e8
Compare
- metric is not used Signed-off-by: Milan Dhaduk <mdhaduk7@gmail.com> Signed-off-by: LDagnachew <leulmdagnachew@gmail.com>
Ahhh, I see. Thanks for pointing that out; I made the change, and the script seems to be working as intended.
And yes, sorry about that, I went ahead and cleaned up the history of this feature branch to only include changes related to the pr.
Let me know if any additional changes are needed, thank you!
Ahhh, I see. Thanks for pointing that out; I made the change, and the script seems to be working as intended.
And yes, sorry about that, I went ahead and cleaned up the history of this feature branch to only include changes related to the pr.
Let me know if any additional changes are needed, thank you!
This is great! I think we can squash all the commits about removing individual metrics into a single one. Then, could you include your script under tests/host_tools/
as part of this PR, and add a very simple test in tests/integration_tests/style
that just executes the script, failing if any unused metrics are found? :o
Sounds good! I will get to that as soon as I finish up some touches on the script!
So to clarify, if a metric is defined and/or aggregated within the vmm crate, but not used anywhere else (excluding tests), then that metric should be marked unused, right?
I ask because I've added a modification to account for metrics like the following that are improperly being marked as unused:
Some((&METRICS.latencies_us.load_snapshot, "load snapshot"))
The modification added rf".+{field}"
as a part of the usage patterns to account for instances similar to the above example. This solves the issue, and the only remaining metrics are the following:
- process_startup_time_us
- process_startup_time_cpu_us
- min_us
- max_us
- sum_us
All of which are only defined within the "vmm" crate and not used elsewhere in the codebase, so they should be removed, correct?
Let me know what you think! :D
So to clarify, if a metric is defined and/or aggregated within the vmm crate, but not used anywhere else (excluding tests), then that metric should be marked unused, right?
Yeah, if the only thing a metric is used for is aggregation, or some other logic contained within a metrics.rs file in the vmm crate, then it should be marked as unused, except...
This solves the issue, and the only remaining metrics are the following:
* process_startup_time_us * process_startup_time_cpu_us * min_us * max_us * sum_us
...these are indeed special .-. The first two only get store()'d in logger/metrics.rs, but it's not actually an aggregation, it's because metrics.rs defines a helper function for setting the metric, and this utility function is then used outside of the metrics module. Similarly the min/max/sum thingies are set in the Drop
impl for LatencyMetricsRecorder
. I don't think there's a general rule that would catch these, so let's just have an "allowlist" that causes the script to ignore precisely these 5 metrics.
Hey @mdhaduk are you planning to continue working on this and taking it to the finish line?
Hey @JackThomson2, yes, I have been meaning to add the integration test to complete this pr, I've just been busy with my current internship; I was thinking of completing this over the weekend, is that alright?
Hey @mdhaduk of course, that's great thank you! If you need any help feel free to reach out :)
Uh oh!
There was an error while loading. Please reload this page.
Opening this as a Draft PR
Changes
Removed unused metrics; used this script to generally identify which metrics might not be used, then manually removed them
To identify unused metrics, I used the following convention
For SharedIncMetric:
For LatencyAggregateMetrics:
Metrics Removed:
Reason
Resolving issue #2901 - Remove/hook unused metrics
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.