-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COS-3126: overlay.d: replace 25rhcos-azure-udev #1776
base: master
Are you sure you want to change the base?
Conversation
Since el9, the WALinuxAgent-udev package installs udev rules directly into the initramfs. As a result, this override is now obsolete and can be removed.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marmijo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@@ -8,6 +8,9 @@ remove-from-packages: | |||
- - zram-generator | |||
- "/usr/lib/systemd/zram-generator.conf" | |||
|
|||
ostree-layers: | |||
- overlay/25rhcos-azure-udev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the same comment from below here would be nice:
- overlay/25rhcos-azure-udev | |
# The azure-vm-utils package provides these udev rules[1], but it wont be added | |
# until EL10 [1]. This can be dropped when moving to EL10, provided the | |
# package is included at that time. | |
- overlay/25rhcos-azure-udev |
ATTRS{nsid}=="?*", ENV{AZURE_DISK_TYPE}="data", PROGRAM="/bin/sh -ec 'echo $$((%s{nsid}-2))'", ENV{AZURE_DISK_LUN}="$result" | ||
|
||
LABEL="azure_disk_nvme_id" | ||
ATTRS{nsid}=="?*", IMPORT{program}="/usr/sbin/azure-nvme-id --udev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a diff from upstream (fedora RPM) here:
--- /usr/lib/udev/rules.d/80-azure-disk.rules 1970-01-01 00:00:00.000000000 +0000
+++ /var/mnt/workdir/80-azure-disk.rules 2025-03-20 16:12:18.020106392 +0000
@@ -32,7 +32,7 @@
ATTRS{nsid}=="?*", ENV{AZURE_DISK_TYPE}="data", PROGRAM="/bin/sh -ec 'echo $$((%s{nsid}-2))'", ENV{AZURE_DISK_LUN}="$result"
LABEL="azure_disk_nvme_id"
-ATTRS{nsid}=="?*", IMPORT{program}="/usr/bin/azure-nvme-id --udev"
+ATTRS{nsid}=="?*", IMPORT{program}="/usr/sbin/azure-nvme-id --udev"
LABEL="azure_disk_symlink"
# systemd v254 ships an updated 60-persistent-storage.rules that would allow
Though.. this really highlights a problem here that I didn't understand before. Previously the udev rules we provided were just rules and didn't require external software.
Here these rules are running /usr/sbin/azure-nvme-id
, but we aren't shipping that here (because we aren't shipping an RPM).
I think this may be a blocker for us for el9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting because these rules are directly copied from the azure-vm-utils
github repo, so there's a diff between that file and the udev rules shipped in the RPM.
as for the azure-nvme-id binary: The tool's description says:
azure-nvme-id provides identification metadata in the response to Identify Namespace command
for some models of NVMe devices. This is found in vendor-specific (vs) field which contains various
identification details with a comma-separated, key=value format.
I'm not sure what, within Azure, would rely on this vendor specific data, but there could be unexpected behavior if the binary is missing. However, the original bug was confirmed to be resolved by adding these udev rules without installing the whole RPM, and using them in an RHCOS Azure instance seems to work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting because these rules are directly copied from the
azure-vm-utils
github repo, so there's a diff between that file and the udev rules shipped in the RPM.
yeah. In rawhide there was a usr/bin usr/sbin migration thing and that probably made its way into f41/f42 RPMs too so that probably is what this is about. It probably shouldn't matter too much I don't guess.
I'm not sure what, within Azure, would rely on this vendor specific data, but there could be unexpected behavior if the binary is missing. However, the original bug was confirmed to be resolved by adding these udev rules without installing the whole RPM, and using them in an RHCOS Azure instance seems to work as expected.
So I think you're telling me the rules we need aren't this one that runs this binary?
I guess it's OK then, but I'd almost rather remove the lines that rely on the binary (or comment them out and add a comment why).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the other rules still work and set everything up correctly even if the binary is missing.
ATTRS{nsid}=="?*", IMPORT{program}="/usr/sbin/azure-nvme-id --udev" | |
#ATTRS{nsid}=="?*", IMPORT{program}="/usr/sbin/azure-nvme-id --udev" |
I commented out the line, but left LABEL="azure_disk_nvme_id"
intact so the other rules with GOTO="azure_disk_nvme_id"
continue to function properly. The behavior was the same with or without that line commented out:
AZURE_DISK_LUN
is set to a value on NVMe data disks- data and os NVMe disks show
AZURE_DISK_TYPE=data
andAZURE_DISK_TYPE=os
respectively - NVMe disks and their partitions show up in
/dev/disk/azure
and/dev/disk/azure/data/by-lun
as expected
This means the udev rule that calls the binary must be failing silently and isn't necessary for this specific issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to have it commented out here in the code with a comment that says why we commented it out and then we should be able to move forward I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Add an overlay to install udev rules that support certain VM types that present managed disks as NVMe devices instead of traditional SCSI. These rules are provded by the azure-vm-utils package which will be included in EL10[1], this overlay can be dropped at that time. [1] https://issues.redhat.com/browse/RHEL-73904
4eb3d71
to
5226f5f
Compare
@marmijo: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
That looks correct, but that seems risky, as we don't know what would need those attributes and how important they are. |
Yeah. I agree it seems risky. But really this is our only option if we want this now. We should probably name it as a limitation and then move forward based on the feedback. |
@marmijo: This pull request references COS-3126 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The current
25rhcos-azure-udev
overlay has been obsolete since we moved to EL9. Let's drop it and replace it with another overlay with the same name.The new
25rhcos-azure-udev
overlay includes udev rules that support Azure Managed NVMe disks. These udev rules are shipped as part of theazure-vm-utils
package[1], but that package wont be included until EL10[2]. We can drop this overlay when we move to EL10, provided the package is included at that time.[1] https://github.com/Azure/azure-vm-utils/blob/9c596916b6774f24420dac0ee7a72a6c9ddb5060/udev/80-azure-disk.rules
[2] https://issues.redhat.com/browse/RHEL-73904