Simplify PCI code #5426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

bchalios wants to merge 2 commits into firecracker-microvm:main

from bchalios:pci_refactoring

Open

Simplify PCI code #5426

bchalios wants to merge 2 commits into firecracker-microvm:main from bchalios:pci_refactoring

Conversation

bchalios

@codecov Codecov

Copy link

codecov bot commented Sep 2, 2025 •

edited

Loading

Codecov Report

❌ Patch coverage is 94.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.85%. Comparing base (5dde18c) to head (f34198a).

Files with missing lines	Patch %	Lines
src/pci/src/configuration.rs	96.87%	2 Missing ⚠️
src/vmm/src/device_manager/pci_mngr.rs	77.77%	2 Missing ⚠️
src/vmm/src/devices/virtio/transport/pci/device.rs	92.59%	2 Missing ⚠️

Additional details and impacted files

@@ Coverage Diff @@
## main #5426 +/- ##
==========================================
+ Coverage 82.73% 82.85% +0.12% 
==========================================
 Files 263 263 
 Lines 27455 27344 -111 
==========================================
- Hits 22715 22656 -59 
+ Misses 4740 4688 -52

Flag	Coverage Δ
5.10-m5n.metal	`82.97% <94.00%> (+0.14%)`	⬆️
5.10-m6a.metal	`82.22% <94.00%> (+0.13%)`	⬆️
5.10-m6g.metal	`79.65% <94.00%> (+0.12%)`	⬆️
5.10-m6i.metal	`82.97% <94.00%> (+0.14%)`	⬆️
5.10-m7a.metal-48xl	`82.21% <94.00%> (+0.13%)`	⬆️
5.10-m7g.metal	`79.65% <94.00%> (+0.12%)`	⬆️
5.10-m7i.metal-24xl	`82.94% <94.00%> (+0.13%)`	⬆️
5.10-m7i.metal-48xl	`82.93% <94.00%> (+0.12%)`	⬆️
5.10-m8g.metal-24xl	`79.64% <94.00%> (+0.12%)`	⬆️
5.10-m8g.metal-48xl	`79.64% <94.00%> (+0.11%)`	⬆️
6.1-m5n.metal	`83.01% <94.00%> (+0.13%)`	⬆️
6.1-m6a.metal	`82.26% <94.00%> (?)`
6.1-m6g.metal	`79.65% <94.00%> (+0.12%)`	⬆️
6.1-m6i.metal	`83.01% <94.00%> (+0.14%)`	⬆️
6.1-m7a.metal-48xl	`82.26% <94.00%> (+0.13%)`	⬆️
6.1-m7g.metal	`79.65% <94.00%> (+0.12%)`	⬆️
6.1-m7i.metal-24xl	`83.01% <94.00%> (+0.13%)`	⬆️
6.1-m7i.metal-48xl	`83.02% <94.00%> (+0.14%)`	⬆️
6.1-m8g.metal-24xl	`79.64% <94.00%> (+0.12%)`	⬆️
6.1-m8g.metal-48xl	`79.64% <94.00%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bchalios added 2 commits

September 2, 2025 13:41

@bchalios


 refactor(pci): move (de)allocation BAR methods out of PciDevice

a9e64e3

PciDevice trait was defining (optional) methods to allocate and
deallocate BAR regions for devices. This logic is quite device specific
and when invoked we typically know what device it refers to exactly.
Currently, we only implement the allocate API for VirtIO devices (no
deallocation whatsoever).
Simplify things by dropping these APIs from the trait definition. Also,
remove the 32bit allocator argument from allocate_bars(); we always only
allocate a single 64-bit BAR for VirtIO devices.
Signed-off-by: Babis Chalios <bchalios@amazon.es>

@bchalios


 refactor(pci): assume we only have a single 64-bit BAR

f34198a

In our case, only VirtIO PCI devices use a BAR region. Moreover, this
region has a fixed size and it's always a 64-bit region. This means we
don't handle 32-bit or IO BARs and we don't support a ROM bar. Encode
these assumptions to the logic that adds BAR regions and detects BAR
reprogramming attempts from the guest side so that we simplify the code.
Signed-off-by: Babis Chalios <bchalios@amazon.es>

@bchalios bchalios force-pushed the pci_refactoring branch from cbe3edc to f34198a Compare

September 2, 2025 11:41

@bchalios bchalios marked this pull request as ready for review

September 2, 2025 12:19

@bchalios bchalios enabled auto-merge (rebase)

September 2, 2025 12:19

@bchalios bchalios self-assigned this

Sep 2, 2025

@bchalios bchalios added the Status: Awaiting review label

Sep 2, 2025

ShadowCurse

ShadowCurse reviewed

Sep 2, 2025

View reviewed changes

src/pci/src/configuration.rs

let reg_idx = BAR0_REG + bar_idx;

if bar_idx >= NUM_BAR_REGS {

return Err(Error::BarInvalid(bar_idx));

error!("pci: invalid BAR index: {bar_idx}");

Copy link

Contributor

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error logs are quite useless because this error will propagate all the way to the vm builder code and then stop the firecracker. Actually maybe we can use asserts here? This is an internal configuration code, so we should not add 'invalid' bars anyway.

Copy link

Contributor Author

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are in the path of /actions InstanceStart, so I think we shouldn't assert. We should return an Error to the user, right? In that case, the error message in the logs is meant to help us debug more easily what went wrong.

Copy link

Contributor

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but user does not control the outcome (except I guess by the usage of pci flag), so in this case I think it is better to panic as this is our and not user responsibility to make sure our internal logic does not break.
Also regarding the error!: I don't mind it, but we don't do it in other places + the concatenated error printed to the use will contain same or more info than this. So I would removed mainly for consistency with rest of the code base.

Copy link

Contributor Author

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's bad user experience to crash the process even when we are crashing it because of an internal bug. I think the correct thing to do is let the user know that something went wrong, by returning an error code. On that I feel quite strongly about.

Regarding the error message, I don't have strong feelings, you are right that the error message will contain similar info. I just think it's easier for debugging, but I'm fine dropping it.

Copy link

Contributor

@roypat roypat Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, I thought the discussions we had internally were that for internal logic errors, we do want to panic. That's definitely how I've been writing my code for the last few months 🤔

Copy link

Contributor Author

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that any code along the code path of an API request should not crash. :/ Maybe I got it wrong?

Copy link

Contributor

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with returning errors along whole path which start with API request is that all interactions with FC fall under this category in some way or another. The discussions we had basically narrowed down to "if it is internal only" we should assert. Also, we already have asserts in PCI code like: https://github.com/firecracker-microvm/firecracker/blob/main/src/pci/src/msix.rs#L210

Copy link

Contributor Author

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but these asserts are triggered by the guest MMIO exits not along an API path. In any case, if indeed my understanding was wrong, I'm happy to turn these in assertions/unwraps

Copy link

Contributor

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. you are right, we have asserts in "guest mmio" path, but here we have error in "internal config" path, so we do error handling totally upside down so to speak... Here, since it is only internal configuration we should assert, while in "guest mmio" path we should have error propagation or better a validity checks up front and then assert inside PCI code.

src/pci/src/configuration.rs

let total_len = cap_data.bytes().len() + 2;

let (cap_offset, tail_offset) = match self.last_capability {

Some((offset, len)) => (Self::next_dword(offset, len), offset + 1),

None => (FIRST_CAPABILITY_OFFSET, CAPABILITY_LIST_HEAD_OFFSET),

};

let end_offset = cap_offset

.checked_add(total_len)

.ok_or(Error::CapabilitySpaceFull(total_len))?;

.ok_or(PciError::CapabilitySpaceOverflow(total_len, cap_offset))?;

Copy link

Contributor

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about asserts.

Labels

Status: Awaiting review

3 participants

@bchalios @ShadowCurse @roypat

Simplify PCI code #5426

Are you sure you want to change the base?

Simplify PCI code #5426

Uh oh!

Conversation

@bchalios bchalios commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@roypat roypat Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@bchalios bchalios Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

@ShadowCurse ShadowCurse Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@bchalios bchalios commented Sep 2, 2025 •

edited

Loading

codecov bot commented Sep 2, 2025 •

edited

Loading