-
Notifications
You must be signed in to change notification settings - Fork 50
Description
LVM volumes are not always mounted after reboot after applying systemd-239-78.0.3 and above.
I constructed several test cases to demonstrate the issue using an Oracle provided AMI ami-076b18946a12c27d6 on AWS.
Here is a sample Cloud Formation template that is used to demonstrate the issue:
non-working-standard.yml.txt
User data:
yum install -y lvm2 yum update -y systemd systemctl disable multipathd nvme=$(lsblk -o NAME,SIZE | awk '/ 1G/ {print 1ドル}') pvcreate /dev/$nvme vgcreate testvg /dev/$nvme lvcreate -l 100%FREE -n u01 testvg mkfs.xfs -f /dev/testvg/u01 echo '/dev/testvg/u01 /u01 xfs defaults 0 0' >> /etc/fstab mkdir -p /u01 mount /u01
Once the template is deployed, confirm that cloud-init completed without errors and /u01 is mounted. Then reboot the EC2 instance, e.g. via reboot.
When it comes back, /u01 is not mounted anymore:
[ec2-user@ip-10-100-101-225 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 2% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
/var/log/messages contains:
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start timed out.
systemd[1]: Timed out waiting for device dev-testvg-u01.device.
systemd[1]: Dependency failed for /u01.
systemd[1]: Dependency failed for Remote File Systems.
systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
systemd[1]: u01.mount: Job u01.mount/start failed with result 'dependency'.
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start failed with result 'timeout'.
I created several Cloud Formation templates: test-cases.zip
non-working-standard: the deployment when systemd is updated to the currently available latest version239-78.0.4and multipathd is disabled./u01is not mounted on rebootnon-working-systemd: the deployment to demonstrate that/u01is not mounted on reboot if systemd is updated to239-78.0.3- the version that introduced this problemworking-fstab-generator-reload-targets-disabled: the deployment wheresystemd-fstab-generator-reload-targets.serviceis disabled. It is the service that Oracle introduced insystemd-239-78.0.3. There is no such a service in the upstream./u01is mounted after reboot.working-multipathd-enabled: the deployment wheremultipathd.serviceis enabled./u01is mounted after rebootworking-systemd: the deployment that usessystemd-239-78.0.1- the one that is shipped with the AMI and it does not have the issue./u01is mounted on reboot
For each of the deployments above, I ran the following commands:
after deployment
date sudo cloud-init status df -h rpm -q systemd systemctl status multipathd systemctl status systemd-fstab-generator-reload-targets sudo reboot
after reboot
date
uptime
df -h
journalctl -b -o short-precise > /tmp/journalctl.txt
sudo cp /var/log/messages /tmp/messages.txt
sudo chmod o+r /tmp/messages.txtThe logs of the command executions are in the commands.txt files inside the archive along with journalctl.txt and messages.txt.
Thus, the issue happens when all of the following conditions are true:
systemd >= 239-78.0.3multipathddisabled- there is a mount on top of LVM
The following workarounds are known to prevent the issue, so that an LVM volume /u01 is mounted after reboot:
- use
systemd < 239-78.0.3 - enable
multipathd - disable
systemd-fstab-generator-reload-targets
I have been able to reproduce this issue only on AWS with different instance types (AMD/Intel based). I was not able to reproduce the issue on Azure with both NVMe and non-NVMe based VM sizes.
What is really happening here is that lvm2-pvscan@.service is not invoked sometimes after applying systemd-239-78.0.3. Therefore, LVM auto-activation is not performed. If I reboot the EC2 instance and find that an LVM volume is not mounted, I can manually activate problem volume groups via vgchange -a y, or I can also run: sudo /usr/sbin/lvm pvscan --cache --activate ay 259:1 for a problem device as it is demonstrated below (the command used by lvm2-pvscan@.service):
[ec2-user@ip-10-100-101-125 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 2% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
[ec2-user@ip-10-100-101-125 ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 32G 0 disk
└─nvme0n1p1 259:2 0 32G 0 part /
nvme1n1 259:1 0 1G 0 disk
[ec2-user@ip-10-100-101-125 ~]$ sudo /usr/sbin/lvm pvscan --cache --activate ay 259:1
pvscan[905] PV /dev/nvme1n1 online, VG testvg is complete.
pvscan[905] VG testvg run autoactivation.
1 logical volume(s) in volume group "testvg" now active
[ec2-user@ip-10-100-101-125 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 799M 0 799M 0% /dev
tmpfs 818M 0 818M 0% /dev/shm
tmpfs 818M 17M 802M 3% /run
tmpfs 818M 0 818M 0% /sys/fs/cgroup
/dev/nvme0n1p1 32G 2.4G 30G 8% /
tmpfs 164M 0 164M 0% /run/user/1000
/dev/mapper/testvg-u01 1016M 40M 977M 4% /u01