c2c72eef975eda9afbb1fe2ee06740a5d577c187
Commit Graph

579 Commits

Author SHA1 Message Date
Zuul
c3ef9a563d Merge "Fix software RAID creation on different physical devices" 2025年07月15日 18:29:00 +00:00
Dmitry Tantsur
9db3cd1e4d Graceful way for hardware managers to ignore certain devices
My use case for this feature is to exclude network devices that use
the cdc_ether driver. These USB network interfaces often cause all sorts
of issues. For example, some models have the same hardcoded MAC address,
which breaks inspection.
Currently, to exclude a certain device, a hardware manager must override
the entire listing function (in my case, list_interfaces). Not only is
it tedious, but it also requires constantly updating the hardware
managers to match the implementation in GenericHardware. Realistically,
it will cause hardware manager authors to inherit GenericHardware, which
is the opposite of how hardware managers should be written.
Note that the node-level skip list only affects root device selection
and cleaning for block devices. This feature affects everything that
uses list_block_devices and is applied before the node-level skip list.
This change adds a new hardware manager call filter_device. For each
network, block or USB device, it allows a hardware manager to do either
of four things:
1. Delegate the decision to a lower level hardware manager by raising
 IncompatibleHardwareMethodError
2. Remove the device by returning None
3. Change the device by returning a modified instance
4. Return the device unchanged to keep it in the listing.
Note that I'm removing debug logging when IncompatibleHardwareMethodError
is raised. Not only the log message is incorrect (the error does not
necessarily mean that the method is not implemented at all), it already
noticeable space in the logs, and with this change will become very
noisy.
Change-Id: I5437343af6c6157882bcf0600dd89bd20478c948
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
2025年07月04日 16:31:02 +02:00
Dmitry Tantsur
9426df9ab3 Split hardware manager initialize out of evaluate_hardware_support
The current code in GenericHardware.evaluate_hardware_support ends up
using hardware manager calls, which then use partly initialized hardware
manager list and can even cause a recursion.
This change introduces a new optional call initialize() which is
guaranteed to run:
1) After all hardware managers have been evaluated
2) After the hardware manager cache is populated
3) In the order of the support level of hardware managers
Change-Id: I068d3d73483c161062aa3b48f3154a2d99941382
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
2025年07月04日 16:30:40 +02:00
Dmitry Tantsur
521811cbcc Fix software RAID creation on different physical devices
When creating multiple software RAID logical disks that use different
sets of physical devices, the partition indices were incorrectly shared
across all devices. This caused the second RAID array creation to fail
because it tried to use partition indices that didn't exist on those
specific devices.
This change fixes the issue by tracking partition indices separately for
each physical device, ensuring that each device's partitions are numbered
correctly starting from their first available index.
Closes-Bug: #2115211
Change-Id: I440db4654f3d1d54274d1eee8c4b21c2b0a18d22
Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
2025年06月25日 16:15:14 +00:00
Zuul
b51cc75ff3 Merge "netutils: Use ethtool ioctl to get permanent mac address" 2025年05月07日 21:53:20 +00:00
Nicolas Belouin
48422a532f netutils: Use ethtool ioctl to get permanent mac address
Fetching the permanent MAC address of the interface instead of the
default one allows to get the right one in case it got changed during
setup (likely with a bonding setup).
In order to fetch the permanent MAC address of a given interface, one
can either use Netlink (either rtnetlink or ethtool), or use ethtool
ioctl.
The use of ioctl feels simpler and requires no additional dependency.
The implementation falls back to older behavior should an error occur.
Closes-Bug: #2103450
Change-Id: I54151990e396ddcf775128ca24d3db08e45c256d
Signed-off-by: Nicolas Belouin <nicolas.belouin@suse.com>
2025年04月25日 12:06:29 +02:00
cid
c03021fee2 Remove eventlet from Ironic Python Agent
This change removes several usages of eventlet from IPA:
- Upgrades all requirements on oslo library versions to new ones that
 support non-eventlet use.
- Removes use of the eventlet wsgi server (via oslo_service.wsgi) and
 replaces it with the cheroot wsgi server.
- Removes explicit patching of python modules with eventlet
Note that due to some oslo libraries still using ``eventlet`` to detect
and workaround it's use. This means that it is still installed in
environments alongside IPA, even if it's not used or patched into any
modules.
Depends-On: https://review.opendev.org/c/openstack/requirements/+/947727
Change-Id: I9accab2d5e9529a88ef5d3db85e76901f14114eb
2025年04月23日 11:01:10 -07:00
Zuul
53349cc7cf Merge "Remove agent_token_required upgrade knob" 2025年04月08日 20:38:18 +00:00
ac85195b7a Update master for stable/2025.1
Add file to the reno documentation build to show release notes for
stable/2025.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.
Sem-Ver: feature
Change-Id: I259249774c39e95b214e77b2ae632c7278e78754
2025年03月18日 17:14:28 +00:00
Julia Kreger
94fde4b3b4 Remove agent_token_required upgrade knob
To help ease upgrades to Victoria, IPA had a knob added
to enable operators to express if agent tokens were required
in their deployment. Since then, the feature is required, however
we left the logic enabling the fun upgrade case handling.
At this point, this knob serves no further use, and can be removed.
Change-Id: I202f06e1b6598a802c9853fb99201c55e7a40cb1
2025年02月18日 14:36:18 +00:00
Julia Kreger
a6ca65201a Lockout agent command results if a token is received
This is a second attempt at securing the get command output endpoint
which could have data such as logs which could potentially have
sensitive details and information after the agent has completed
one or more actions.
Now, if a token is receieved, the agent locks out the command results
endpoint, and requires all future calls to include it.
This allows for the agent to be backwards compatible.
Special thanks go to cid for his first attempt at this, which I took
for the basis of some of the testing required.
Closes-Bug: #2086866
Co-Authored-By: cid@gr-oss.io
Change-Id: Ia39a3894ef5efaffd7e1d22cc6244059a32175ff
2025年02月18日 06:32:48 -08:00
Zuul
8ab0bfbd9b Merge "Revert "Add token validation to GET command endpoints"" 2025年02月17日 18:35:53 +00:00
Dmitry Tantsur
3968715908 Revert "Add token validation to GET command endpoints"
This reverts commit 6f860995c6.
Reason for revert: the change has broken virtually everyone who
has not updated Ironic before IPA. To make the matter worse, the
attached release note is not descriptive and does not explain
the upgrade impact.
The reverted change should be reworked to allow a graceful period.
Change-Id: I2a2a03dd8409af900b938494ceafd45a89e0c197
2025年02月17日 13:40:19 +00:00
Zuul
3261052f5d Merge "follow-up: update release note for bootable container work" 2025年02月14日 22:46:58 +00:00
Zuul
2e9964e126 Merge "Add token validation to GET command endpoints" 2025年02月14日 22:46:56 +00:00
cid
a42980a016 Ensure IPA is locked down in rescue mode
Securely handle state transition by locking down IPA at the final
stage of rescue operation to prevent restarts on tenant networks.
Closes-Bug: #2086865
Change-Id: I8e1be8da93a8c3fdf3cff7ad386c702d970d15f1
2025年02月14日 18:18:50 +01:00
cid
6f860995c6 Add token validation to GET command endpoints
Currently, we only validate authentication tokens for POST but not
for GET requests which could mean anyone can retrieve command results
without authentication. Adding that uniformly across all command-related
endpoints.
Closes-Bug: #2086866
Depends-On: https://review.opendev.org/c/openstack/ironic/+/941607
Change-Id: Ib7f58b1694273beeb25314984c6e049376244d86
2025年02月13日 23:28:56 +00:00
Julia Kreger
c8763bba06 follow-up: update release note for bootable container work
Updates the release note for the bootable container work to
clarify the existence of the configuration option which can
be utilized to disable bootable container deployments in the
ramdisk.
Change-Id: I5b269947884c015db38cf98ac782472a62858455
2025年02月12日 06:39:47 -08:00
Zuul
a6d1921056 Merge "Bootable container support" 2025年02月10日 19:26:34 +00:00
Julia Kreger
1508cc4cd0 Bootable container support
Adds support for bootable containers to be deployed by the agent.
Related: https://review.opendev.org/c/openstack/ironic/+/937897
Change-Id: I66cb37d117d2afc335f015fb1fc31bdbd5c3cee5
2025年02月07日 15:59:48 -08:00
Kaifeng Wang
96bf1ef012 Collect bus and driver for interfaces
It's useful to have pci bus address/driver collected, the operator can
use the information to configure portgroup in a consistent way.
Change-Id: I432bca881ad881bae6d5e67c9b6fb52fe55b4e1e
2025年02月01日 15:22:26 +08:00
Zuul
0c35e7e2da Merge "Add support for burnin-gpu" 2025年01月29日 19:20:10 +00:00
kubajj
018a5f6253 Fix errors in the function erase_devices_express
Prevents the UnboundLocalError in erase_devices_express clean step.
Closes-Bug: #2095499
Change-Id: I01ce5005a62638ff960d2a75f225f882b2d56973
2025年01月22日 14:17:30 +00:00
Zuul
ca07e941cf Merge "Add a release note for 939340" 2025年01月17日 19:40:39 +00:00
cid
c222626b01 Treat 'No space left on device' error as fatal
Fail without retries when Errno 28 - "No space left
on device" error is encountered.
Closes-Bug: #2094854
Change-Id: Ie84b422916ddc02f2474164fe3da083324ef4824
2025年01月17日 11:13:01 +01:00
kubajj
2ece938671 Add a release note for 939340
Follow-up to 939340 to add a release note about the bug-fix.
Change-Id: I202f22d40776ab5d3245b8e14021d1404a9f478d
2025年01月16日 09:34:08 +00:00
cid
dfcb86d738 Add support for burnin-gpu
Adds support for running burnin tests on GPUs
using gpu-burn[1]. Also refactors stress-ng code
to be a bit cleaner.
Requires gpu-burn to be pre-installed within the IPA.
* https://github.com/wilicc/gpu-burn
Co-Authored-By: Scott Solkhon <scottsolkhon@gmail.com>
Closes-Bug: #2069085
Change-Id: I8f8cace6ebc2b7f1c245c82a64609cdfc1c492f9
2025年01月03日 17:59:31 +00:00
Zuul
06077cb88e Merge "Inventoried MAC address for only ipv6 addresses" 2024年12月04日 19:09:09 +00:00
b010580caf reno: Update master for unmaintained/2023.1
Update the 2023.1 release notes configuration to build from
unmaintained/2023.1.
Change-Id: I0d8b1773367a61b326b5a6ff86ac1f126b15099b
2024年11月29日 07:54:13 +00:00
Maximilian Brandt
6ccd3965ff Inventoried MAC address for only ipv6 addresses
Extended the function that expose BMC MAC address in inventory data
for an IPv6 only interface.
Previously, if no IPv4 address was configured, no mac address was exposed.
Change-Id: I93e49d308cfd63be1c09749ced4428a87a3daff9
2024年11月21日 17:51:15 +01:00
Zuul
01639aab20 Merge "Add a command to lock down the agent" 2024年11月21日 16:20:33 +00:00
Zuul
4f9f461ce9 Merge "A hardware manager call for a full sync before shutdown" 2024年11月07日 15:07:12 +00:00
Dmitry Tantsur
aa98250066 Add a command to lock down the agent
To support a safer take-over from the provisioning to the tenant network
for hardware that cannot be powered off, this change introduces a new
command system.lockdown. When invoked, it stops the API, the heartbeater
and disables all network interfaces (if possible).
Partial-Bug: #2077432
Change-Id: I211fc64a46226127b0d82ab458029b3c702b3f74
2024年11月07日 15:50:06 +01:00
Zuul
5746ac1222 Merge "Vendor metrics library from Ironic-Lib & deprecate" 2024年11月05日 16:11:20 +00:00
Dmitry Tantsur
5aa0c1a2bb A hardware manager call for a full sync before shutdown
This is largely required for the future lockdown command but can also be
used before the normal shutdown, especially in the sync command which is
currently used before an out-of-band shutdown command is issued.
In addition to a plain sync, the new command also tells the kernel to
drop its cached and issues a low-level sync command to each block
device.
Partial-Bug: #2077432
Change-Id: I3fc87b20bc5387a466b24ebc19b9982e4e368d20
2024年11月05日 15:27:10 +01:00
Jay Faulkner
75abdb4148 Vendor metrics library from Ironic-Lib & deprecate
We are phasing out use of ironic-lib, and as such are removing the
metrics module from it. However, due to it's requirement of having
a statsd instance on the same subnet as the agent and there being no
support for prometheus exporting of metrics from IPA, these metrics are
no longer valuable (in the agent).
We are vendoring the module for the deprecation in order to facilitate
its removal from ironic-lib.
Change-Id: Ie50e078bc3f78d65cfa53680dc4116d1119ce155
2024年11月04日 20:02:11 +00:00
Zuul
b851ae1bc8 Merge "Remove Python 3.8 support" 2024年10月31日 17:44:24 +00:00
Takashi Kajinami
b0ef2c0483 Remove Python 3.8 support
Python 3.8 was removed from the tested runtimes for 2024.2[1] and has
not been tested since then.
Also add Python 3.12 which is part of the tested runtimes for 2025.1.
Now unit tests job with Python 3.12 is voting.
[1] https://governance.openstack.org/tc/reference/runtimes/2024.2.html
Change-Id: Id314b4453d81dcab806768e3c7ab5dc050a35136
2024年10月24日 18:15:08 +09:00
Steve Baker
1a939105ba Capture and log sector sizes
``logical_sectors`` and ``physical_sectors`` sizes are now captured for
each hardware info ``disks`` entry, and also logged for ``lsblk`` calls.
This will be increasingly useful as storage devices with 4096 byte
sector sizes become more common.
Change-Id: I80b6b137f6e3071d9b8a4c1abe14416249aed9ac
2024年10月24日 15:07:56 +13:00
e4d07fd1ba Update master for stable/2024.2
Add file to the reno documentation build to show release notes for
stable/2024.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.2.
Sem-Ver: feature
Change-Id: Iffa68c4207e97d92382fbff637a661a879c1909d
2024年09月20日 13:52:29 +00:00
Zuul
ab99f36baa Merge "Check for the existence of an IPMI device" 2024年09月09日 16:44:27 +00:00
cid
2d79eae382 Check for the existence of an IPMI device
Check for IPMI device files before the use of the `'ipmitool lan.*'`
command, avoiding unnecessary calls on non-IPMI systems.
Closes-Bug: #2076367
Change-Id: Ib800717701e6f2828df55a0da0e999fc014c12e1
2024年09月05日 20:48:07 +01:00
Jay Faulkner
e303a369dc Inspect non-raw images for safety
When IPA gets a non-raw image, it performs an on-the-fly conversion
using qemu-img convert, as well as running qemu-img frequently to get
basic information about the image before validating it.
Now, we ensure that before any qemu-img calls are made, that we have
inspected the image for safety and pass through the detected format.
If given a disk_format=raw image and image streaming is enabled
(default), we retain the existing behavior of not inspecting it in
any way and streaming it bit-perfect to the device. In this case, we
never use qemu-based tools on the image at all.
If given a disk_format=raw image and image streaming is disabled, this
change fixes a bug where the image may have been converted if it was not
actually raw in the first place. We now stream these bit-perfect to the
device.
Adds two config options:
- [DEFAULT]/disable_deep_image_inspection, which can be set to "True" in
 order to disable all security features. Do not do this.
- [DEFAULT]/permitted_image_formats, default raw,qcow2, for image types
 IPA should accept.
Both of these configuration options are wired up to be set by the lookup
data returned by Ironic at lookup time.
This uses a image format inspection module imported from Nova; this
inspector will eventually live in oslo.utils, at which point we'll
migrate our usage of the inspector to it.
Closes-Bug: #2071740
Change-Id: I5254b80717cb5a7f9084e3eff32a00b968f987b7
2024年09月04日 09:11:28 -07:00
Riccardo Pittau
bd3b596ced Fix series in release notes
Change-Id: I6844ce33274afdb64e78b79930c8aa32776e7665
2024年08月23日 10:16:27 +02:00
Riccardo Pittau
599a825554 Fix versions in release notes
Change-Id: Ief6299e4b1bbef5fdb33a28b90b078f420cf8508
2024年06月10日 16:01:36 +02:00
Jay Faulkner
c39517b044 Call evaluate_hardware_support exactly once per hwm
Fixes an issue where we could call evaluate_hardware_support multiple
times each run. Now, instead, we cache the values and use the cache
where needed.
Adds unit test coverage for get_managers and the new method.
Fixes issue where we were caching hardware managers between unit tests.
Also includes fixes for codespell CI:
- skip build files in repo
- fix spelling issues introduced to repo
Closes-bug: 2066308
Change-Id: Iebc5b6d2440bfc9f23daa322493379bbe69e84d0
2024年05月22日 08:46:21 -07:00
c303bd971b reno: Update master for unmaintained/zed
Update the zed release notes configuration to build from
unmaintained/zed.
Change-Id: I673a729e1598d2100631262d61c91690f500306b
2024年05月06日 06:22:59 +00:00
Julia Kreger
6ac3f350c0 Unmount config drives
If this seems like deja vu, that is because it is. We had this
very same issue with the original CoreOS ramdisk. Since we don't
control the whole OS of the ramdisk, it only made sense to teach
the agent to umount the folder.
The folder is referenced already, and the agent does have safeguards
in place, but unfortunately this issue led to a rebuild breaking where
cloud-init, glean, and the agent were all trying do the right thing
as they thought, and there were just multiple /mnt/config folders
present in the OS. These are separate issues we also need to try and
remedy.
What happens is when the device is locked via a mount, the partition
table is never updated to the running OS as the mount creates a lock.
So the agent ends up thinking, in the case of a rebuild, that everything
including creating a configuration drive on that device has been
successful, but when you reboot, there is no partition table entry
for the new partition as the change was not successfully written.
This state prevented the workload from rebooting properly.
This change eliminates that possibility moving forward by attempting
to ensure that the cloud configuration folder is no longer mounted.
Change-Id: I4399dd0934361003cca9ff95a7e3e3ae9bba3dab
2024年04月29日 15:41:59 -07:00
Zuul
28053644cd Merge "add mixed matching of root device hints" 2024年04月27日 17:26:25 +00:00
Zuul
2b67f277b7 Merge "Step to clean UEFI NVRAM entries" 2024年04月27日 02:10:54 +00:00