-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Simplify PCI code #5426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify PCI code #5426
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@ ## main #5426 +/- ## ========================================== + Coverage 82.73% 82.85% +0.12% ========================================== Files 263 263 Lines 27455 27344 -111 ========================================== - Hits 22715 22656 -59 + Misses 4740 4688 -52
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PciDevice trait was defining (optional) methods to allocate and deallocate BAR regions for devices. This logic is quite device specific and when invoked we typically know what device it refers to exactly. Currently, we only implement the allocate API for VirtIO devices (no deallocation whatsoever). Simplify things by dropping these APIs from the trait definition. Also, remove the 32bit allocator argument from allocate_bars(); we always only allocate a single 64-bit BAR for VirtIO devices. Signed-off-by: Babis Chalios <bchalios@amazon.es>
In our case, only VirtIO PCI devices use a BAR region. Moreover, this region has a fixed size and it's always a 64-bit region. This means we don't handle 32-bit or IO BARs and we don't support a ROM bar. Encode these assumptions to the logic that adds BAR regions and detects BAR reprogramming attempts from the guest side so that we simplify the code. Signed-off-by: Babis Chalios <bchalios@amazon.es>
cbe3edc to
f34198a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error logs are quite useless because this error will propagate all the way to the vm builder code and then stop the firecracker. Actually maybe we can use asserts here? This is an internal configuration code, so we should not add 'invalid' bars anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are in the path of /actions InstanceStart, so I think we shouldn't assert. We should return an Error to the user, right? In that case, the error message in the logs is meant to help us debug more easily what went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but user does not control the outcome (except I guess by the usage of pci flag), so in this case I think it is better to panic as this is our and not user responsibility to make sure our internal logic does not break.
Also regarding the error!: I don't mind it, but we don't do it in other places + the concatenated error printed to the use will contain same or more info than this. So I would removed mainly for consistency with rest of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's bad user experience to crash the process even when we are crashing it because of an internal bug. I think the correct thing to do is let the user know that something went wrong, by returning an error code. On that I feel quite strongly about.
Regarding the error message, I don't have strong feelings, you are right that the error message will contain similar info. I just think it's easier for debugging, but I'm fine dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, I thought the discussions we had internally were that for internal logic errors, we do want to panic. That's definitely how I've been writing my code for the last few months 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that any code along the code path of an API request should not crash. :/ Maybe I got it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with returning errors along whole path which start with API request is that all interactions with FC fall under this category in some way or another. The discussions we had basically narrowed down to "if it is internal only" we should assert. Also, we already have asserts in PCI code like: https://github.com/firecracker-microvm/firecracker/blob/main/src/pci/src/msix.rs#L210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but these asserts are triggered by the guest MMIO exits not along an API path. In any case, if indeed my understanding was wrong, I'm happy to turn these in assertions/unwraps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. you are right, we have asserts in "guest mmio" path, but here we have error in "internal config" path, so we do error handling totally upside down so to speak... Here, since it is only internal configuration we should assert, while in "guest mmio" path we should have error propagation or better a validity checks up front and then assert inside PCI code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about asserts.
Uh oh!
There was an error while loading. Please reload this page.
This is the first in a series of PRs that are meant to simplify our PCI codebase
Changes
Mainly two changes:
PciDevicetraitReason
A lot of the PCI code we pulled from Cloud Hypervisor is built in the form of a library trying to be generic and support various functionalities of the PCI specification.a
In our case, we only support certain functionality. For example, we only have two types of PCI devices, VirtIO devices and the Root Port. Only VirtIO devices use BAR regions (64 bits). We don't use 32-bit, I/O or ROM BARs.
We'd like to move towards a PCI implementation which is tailored to our use-case instead of a generic library implementation.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.