Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a dependency loop #10388

Closed
wants to merge 2 commits into from
Closed

Conversation

rlaager
Copy link
Member

@rlaager rlaager commented May 30, 2020

Motivation and Context

When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed. This avoids a dependency
loop on root-on-ZFS systems:

  • systemd-random-seed.service After (via RequiresMountsFor)
  • var-lib.mount After
  • zfs-import.target After
  • zfs-import-{cache,scan}.service After
  • cryptsetup.service After
  • systemd-random-seed.service

Description

This changes zfs-mount-generator to omit the dependency and ordering on zfs-import.target if the pool is already imported (which is the case for the root pool, as it is imported by the initrd).

This fixes the issue reported to Ubuntu here:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1875577

This change reverts 8ae8b2a from #9360. The author of that (didrocks) is aware of this. See my comment: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1875577/comments/5 and his reply: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1875577/comments/9

How Has This Been Tested?

I have tested this (as adjusted for ZFS 0.8.3 in 20.04) on test VMs as part of the Root on ZFS HOWTO updates. Prior to that, it was tested by the original reporter here:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1875577/comments/13

This could use a little testing in master.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@rlaager
Copy link
Member Author

rlaager commented May 30, 2020

@didrocks @aerusso @InsanePrawn Can you take a look at this?

@rlaager
Copy link
Member Author

rlaager commented May 31, 2020

I've pushed another commit that I think will fix #10356.

zfs-load-key-DATASET.service was gaining an After=systemd-journald.socket due to its stdout/stderr going to the journal (which is the default). systemd-journald.socket has an After (via RequiresMountsFor=/run/systemd/journal) on -.mount. If the root filesystem is encrypted, -.mount gets an After zfs-load-key-DATASET.service. By setting stdout and stderr to null on the key load services, we avoid this loop.

Always setting StandardOutput=null and StandardError=null was the simplest fix I could think of. There are other options that are more complicated.

We could do this only for specifically named datasets to account for the Root-on-ZFS HOWTO / Ubuntu installer layout. But that wouldn't fix #10356 where the user is using a slightly different encryption root (rpool/ROOT instead of rpool). It would also be needlessly restrictive on the pool name. So that's probably out.

We could check if the key is currently loaded and set them in that case. This would be a bit of a hack, but would work okay on boot. It would lead to (probably harmless in practice) different behavior after a systemctl daemon-reload for non-root-filesystem datasets, which isn't ideal.

We could check to see if the mountpoint of the encryption root is /, but there again, that wouldn't work for the case in #10356. If we allowed / or none, then it would.

We could check for datasets with a mountpoint of / and then write a drop-in unit for their encryption root's zfs-load-key-*.service. That also seems a little goofy, and I'm not 100% sure if generators writing drop-in units is supported, but it would probably work.

If someone would prefer one of these alternatives or something else, please let me know. Otherwise, I'm fine with the simple solution. I don't see much value in the output of the zfs-load-key-* services anyway.

@codecov-commenter
Copy link

codecov-commenter commented May 31, 2020

Codecov Report

Merging #10388 into master will decrease coverage by 0.26%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10388      +/-   ##
==========================================
- Coverage   79.75%   79.48%   -0.27%     
==========================================
  Files         394      394              
  Lines      124662   124662              
==========================================
- Hits        99422    99092     -330     
- Misses      25240    25570     +330     
Flag Coverage Δ
#kernel 80.30% <ø> (-0.12%) ⬇️
#user 64.52% <ø> (-1.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/os/linux/spl/spl-kmem-cache.c 78.44% <0.00%> (-11.43%) ⬇️
module/zcommon/zfs_fletcher.c 68.09% <0.00%> (-10.20%) ⬇️
cmd/ztest/ztest.c 75.02% <0.00%> (-5.91%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
module/zfs/vdev_rebuild.c 93.69% <0.00%> (-3.05%) ⬇️
module/zcommon/zfs_fletcher_superscalar.c 97.05% <0.00%> (-2.95%) ⬇️
lib/libzpool/kernel.c 64.53% <0.00%> (-2.29%) ⬇️
module/zfs/vdev_raidz_math.c 76.57% <0.00%> (-2.26%) ⬇️
module/zcommon/zfs_fletcher_superscalar4.c 98.43% <0.00%> (-1.57%) ⬇️
module/zcommon/zfs_fletcher_sse.c 98.66% <0.00%> (-1.34%) ⬇️
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d60c0db...1e6ae42. Read the comment docs.

@InsanePrawn
Copy link
Contributor

InsanePrawn commented Jun 1, 2020

I'll have to test this on my fringe Arch setup tomorrow-ish when I return home.

Does nulling stdout in the unit affect/break systemd-ask-password during init/from a tty/from an X terminal?

Note to future self:

Master:

v0.8.0-789_gfb822260b

-- Logs begin at Sun 2019-08-18 22:07:33 CEST, end at Mon 2020-06-01 03:04:02 CEST. --
May 31 01:13:04 archlinux systemd[1]: Starting Load ZFS key for zroot...
May 31 01:13:06 archlinux systemd[1]: Finished Load ZFS key for zroot.
May 31 01:13:07 hostname systemd[1]: zfs-load-key-zroot.service: Found ordering cycle on cryptsetup.target/start
May 31 01:13:07 hostname systemd[1]: zfs-load-key-zroot.service: Found dependency on [email protected]/start
May 31 01:13:07 hostname systemd[1]: zfs-load-key-zroot.service: Found dependency on -.mount/start
May 31 01:13:07 hostname systemd[1]: zfs-load-key-zroot.service: Found dependency on zfs-load-key-zroot.service/start
May 31 01:13:07 hostname systemd[1]: zfs-load-key-zroot.service: Job cryptsetup.target/start deleted to break ordering cycle starting with zfs-load-key-zroot.service/start
May 31 01:13:07 hostname systemd[1]: systemd-udevd-control.socket: Found ordering cycle on -.mount/start
May 31 01:13:07 hostname systemd[1]: systemd-udevd-control.socket: Found dependency on zfs-load-key-zroot.service/start
May 31 01:13:07 hostname systemd[1]: systemd-udevd-control.socket: Found dependency on systemd-udev-trigger.service/start
May 31 01:13:07 hostname systemd[1]: systemd-udevd-control.socket: Found dependency on systemd-udevd-control.socket/start
May 31 01:13:07 hostname systemd[1]: systemd-udevd-control.socket: Job zfs-load-key-zroot.service/start deleted to break ordering cycle starting with systemd-udevd-control.socket/start
May 31 01:13:09 hostname systemd[1]: -.mount: Found ordering cycle on zfs-load-key-zroot.service/start
May 31 01:13:09 hostname systemd[1]: -.mount: Found dependency on cryptsetup.target/start
May 31 01:13:09 hostname systemd[1]: -.mount: Found dependency on [email protected]/start
May 31 01:13:09 hostname systemd[1]: -.mount: Found dependency on -.mount/start
May 31 01:13:09 hostname systemd[1]: -.mount: Job cryptsetup.target/start deleted to break ordering cycle starting with -.mount/start
May 31 01:13:10 hostname systemd[1]: [email protected]: Found ordering cycle on -.mount/start
May 31 01:13:10 hostname systemd[1]: [email protected]: Found dependency on zfs-load-key-zroot.service/start
May 31 01:13:10 hostname systemd[1]: [email protected]: Found dependency on cryptsetup.target/start
May 31 01:13:10 hostname systemd[1]: [email protected]: Found dependency on [email protected]/start
May 31 01:13:10 hostname systemd[1]: [email protected]: Job -.mount/start deleted to break ordering cycle starting with [email protected]/start
May 31 01:13:10 hostname systemd[1]: -.mount: Found ordering cycle on zfs-load-key-zroot.service/start
May 31 01:13:10 hostname systemd[1]: -.mount: Found dependency on cryptsetup.target/start
May 31 01:13:10 hostname systemd[1]: -.mount: Found dependency on [email protected]/start
May 31 01:13:10 hostname systemd[1]: -.mount: Found dependency on -.mount/start
May 31 01:13:10 hostname systemd[1]: -.mount: Job cryptsetup.target/start deleted to break ordering cycle starting with -.mount/start

@didrocks
Copy link
Contributor

didrocks commented Jun 1, 2020

Thanks Richard for working on this!
I’m fine on the simpler solution as well.

Some notes though: when the zfs mount generator was written, there has been quite some discussions on avoiding calling any zfs or zpool commands for latency (this is indeed, what Lennart is recommending). Those are the reasons why there are the zfs-list.cache directory.

However, as you saw in our distro ubuntu patch with the addition of zsys, this is unavoidable in some cases in practice. For instance, if you boot on a snapshot which will then clone in the initramfs the dataset (and not rollback), this doesn’t work properly:

1.0 You boot on rpool/ROOT/machine@snap1 which is then clone on rpool/ROOT/machine_snap1. Similarly, rpool/ROOT/machine/opt@snap1 is cloned as rpool/ROOT/machine_snap1/opt.

There are in that situation 2 cases with one which accidently work, even if the state is incorrect:

2.a. The initiramfs mounts both rpool/ROOT/machine_snap1 and rpool/ROOT/machine_snap1/opt. The ZFS generator creates then both -.mount and opt.mount which references (from the cache) rpool/ROOT/machine and rpool/ROOT/machine/opt. However, as / and /opt are already mounted, systemd then ignores them and mark the units are successful. This thus accidently work (but not the state systemd believes the system is) :)
2.b. Worse: initramfs only mounts rpool/ROOT/machine_snap1. The ZFS generator creates then both -.mount and opt.mount which references (from the cache) rpool/ROOT/machine and rpool/ROOT/machine/opt. Similarly than before systemd sees / already beeing mounted and do nothing, however, it will mount via opt.mount using rpool/ROOT/machine/opt. We thus end up with a frankenstein setup:

  • / with rpool/ROOT/machine_snap1
  • /opt with rpool/ROOT/machine/opt

The main goal is to illustrate that I think in general, you are right to call zfs commands there (we should just try to minimize them as much as possible).

I can file a bug for this separate issue and trigger this discussions, while upstreaming a first pass to get a better situation for the above (basically, refreshing the cache if we boot on a snapshot).

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jun 1, 2020
@rlaager
Copy link
Member Author

rlaager commented Jun 4, 2020

@didrocks That sounds like a separate issue.

@didrocks
Copy link
Contributor

didrocks commented Jun 4, 2020

@didrocks That sounds like a separate issue.

Oh, it completely is, I’m just adding arguments on the "anyway, you need to call external commands like zpool or zfs" which is what you could get during review.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 8, 2020
@didrocks
Copy link
Contributor

So, I tried this patch on ubuntu (groovy).
Note that we are back to have the random-seed dep cycle issue (which was fixed in the initial patch) and some other due to cryptsetup:

juin 10 12:03:21 u-Standard-PC-Q35-ICH9-2009 systemd[1]: systemd-random-seed.service: Found 
ordering cycle on var-lib.mount/start
juin 10 12:03:21 u-Standard-PC-Q35-ICH9-2009 systemd[1]: systemd-random-seed.service: Job var-lib.mount/start deleted to break ordering cycle starting with systemd-random-seed.service/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: [email protected]: Found ordering cycle on systemd-random-seed.service/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: [email protected]: Job systemd-random-seed.service/start deleted to break ordering cycle starting with [email protected]/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Found ordering cycle on cryptsetup.target/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Job cryptsetup.target/start deleted to break ordering cycle starting with zfs-import-cache.service/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Found ordering cycle on cryptsetup.target/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Job [email protected]/start deleted to break ordering cycle starting with zfs-import-cache.service/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Found ordering cycle on cryptsetup.target/start
juin 10 12:03:22 u-Standard-PC-Q35-ICH9-2009 systemd[1]: zfs-import-cache.service: Job systemd-random-seed.service/start deleted to break ordering cycle starting with zfs-import-cache.service/start

Note that the number of those warnings in the journal isn’t stable (sometimes, we have more or less of them, only systemd-random-seed.service is always there).

For reference, I tried with and without this distro-patch (no bug referenced, unsure if we should drop it?):

Description: Start zfs-mount.service after zfs-load-module.service, and if zfs
 module was actually loaded. This should allow installation of probert in
 containers, which pulls in zfs-linux utilities.
Author: Dimitri John Ledkov <[email protected]>


--- zfs-linux-0.8.3.orig/etc/systemd/system/zfs-mount.service.in
+++ zfs-linux-0.8.3/etc/systemd/system/zfs-mount.service.in
@@ -7,6 +7,8 @@ After=zfs-import.target
 After=systemd-remount-fs.service
 Before=local-fs.target
+After=zfs-load-module.service
+ConditionPathExists=/sys/module/zfs
 
 [Service]
 Type=oneshot

So, I’m unsure this fixes it all or not.

@rlaager rlaager added Status: Code Review Needed Ready for review and testing and removed Status: Accepted Ready to integrate (reviewed, tested) labels Jun 10, 2020
@rlaager
Copy link
Member Author

rlaager commented Jun 10, 2020

@didrocks I see you merged this into the Ubuntu package already. Did you find more info? Otherwise, can you provide the output from something like systemctl show systemd-random-seed var-lib.mount systemd-cryptsetup@swap zfs-import-cache | grep -E "^(Names|After|RequiresMountsFor)" That should help us start narrowing down the loop.

Can you also try disabling the cryptoswap to see if that works around the issue? That is, does the problem only occur (with the patch) if encrypted swap is in use?

@didrocks
Copy link
Contributor

I merged the patch in ubuntu as we have less warnings with it, jibel has a vm ready and will post the results shortly so that we can debug that and have a better idea of the loop.

@jibel
Copy link
Contributor

jibel commented Jun 11, 2020

Here are the logs with encrypted swap with @rlaager 's patch and without dimitri's ones.

juin 11 10:29:53 u systemd[1]: Detected architecture x86-64.
juin 11 10:29:53 u systemd[1]: Set hostname to <u-Standard-PC-Q35-ICH9-2009>.
juin 11 10:29:53 u systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
juin 11 10:29:53 u systemd[1]: [email protected]: Found ordering cycle on systemd-random-seed.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on var-lib.mount/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import-cache.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-load-module.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on cryptsetup.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on [email protected]/start
juin 11 10:29:53 u systemd[1]: [email protected]: Job systemd-random-seed.service/start deleted to break ordering cycle starting with [email protected]/start
juin 11 10:29:53 u systemd[1]: Created slice system-modprobe.slice.
juin 11 10:29:53 u systemd[1]: Created slice Cryptsetup Units Slice.
juin 11 10:29:53 u systemd[1]: Created slice system-systemd\x2dfsck.slice.
--
juin 11 10:29:53 u systemd[1]: Started Journal Service.
juin 11 10:29:53 u systemd-udevd[1183]: Using default interface naming scheme 'v245'.
juin 11 10:29:53 u systemd-udevd[1183]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found ordering cycle on zfs-load-module.service/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on cryptsetup.target/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on [email protected]/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on systemd-random-seed.service/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on var-lib.mount/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on zfs-import.target/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Found dependency on zfs-import-cache.service/start
juin 11 10:29:53 u systemd[1]: zfs-import-cache.service: Job zfs-load-module.service/start deleted to break ordering cycle starting with zfs-import-cache.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found ordering cycle on systemd-random-seed.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on var-lib.mount/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import-cache.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-load-module.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on cryptsetup.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on [email protected]/start
juin 11 10:29:53 u systemd[1]: [email protected]: Job systemd-random-seed.service/start deleted to break ordering cycle starting with [email protected]/start
juin 11 10:29:53 u systemd[1]: Condition check resulted in Show Plymouth Boot Screen being skipped.
juin 11 10:29:53 u systemd[1]: Condition check resulted in Forward Password Requests to Plymouth Directory Watch being skipped.
juin 11 10:29:53 u systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
--
juin 11 10:29:53 u kernel: snd_hda_codec_generic hdaudioC0D0:    inputs:
juin 11 10:29:53 u kernel: snd_hda_codec_generic hdaudioC0D0:      Line=0x5
juin 11 10:29:53 u udevadm[1173]: systemd-udev-settle.service is deprecated.
juin 11 10:29:53 u systemd[1]: [email protected]: Found ordering cycle on systemd-random-seed.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on var-lib.mount/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-import-cache.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on zfs-load-module.service/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on cryptsetup.target/start
juin 11 10:29:53 u systemd[1]: [email protected]: Found dependency on [email protected]/start
juin 11 10:29:53 u systemd[1]: [email protected]: Job systemd-random-seed.service/start deleted to break ordering cycle starting with [email protected]/start
juin 11 10:29:53 u systemd[1]: Condition check resulted in Show Plymouth Boot Screen being skipped.
juin 11 10:29:53 u systemd[1]: Condition check resulted in Forward Password Requests to Plymouth Directory Watch being skipped.
juin 11 10:29:53 u systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.

And here is the output of the command you requested:

sh $ systemctl show systemd-random-seed var-lib.mount systemd-cryptsetup@swap zfs-import-cache | grep -E "^(Names|After|RequiresMountsFor)"
Names=systemd-random-seed.service
After=systemd-journald.socket var-lib.mount system.slice systemd-remount-fs.service -.mount
RequiresMountsFor=/var/lib/systemd/random-seed
Names=var-lib.mount
After=system.slice systemd-journald.socket local-fs-pre.target -.mount zfs-import.target
RequiresMountsFor=/var
[email protected]
Names=zfs-import-cache.service
After=systemd-journald.socket cryptsetup.target systemd-remount-fs.service system.slice zfs-load-module.service multipathd.target systemd-udev-settle.service

Disabling encrypted swap doesn't work around the issue. Note that there is still another encrypted partition mounted from initrd.

@rlaager rlaager force-pushed the fix-dependency-loop branch from dc4242b to 13dd589 Compare July 30, 2020 23:51
@rlaager
Copy link
Member Author

rlaager commented Jul 30, 2020

I've uploaded the simple fix from @aerusso posted at:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966565

I still need to get to reviewing the last few comments here.

@aerusso
Copy link
Contributor

aerusso commented Jul 31, 2020

Sorry @rlaager for the delay looking at this, I completely lost track of this MR. Apparently getting it into Debian was the quickest way to get me to test it!

With the change, I see no harm in the patch. However---and I mentioned this before---I still think these key load dependencies are suspicious, probably contributing to some of these dependency loops.

@behlendorf
Copy link
Contributor

@rlaager just let me know when your happy with this and we can get it merged.

@rlaager
Copy link
Member Author

rlaager commented Aug 1, 2020

For reference, I tried with and without this distro-patch (no bug referenced, unsure if we should drop it?):
...
+ConditionPathExists=/sys/module/zfs

Ha! We (upstream) have now independently arrived at that same idea (though I suggested ConditionPathIsDirectory), through a series of pull requests ending in #10663.

@rlaager
Copy link
Member Author

rlaager commented Aug 1, 2020

I finally got a chance to look at this in detail again. I believe I've gotten to the bottom of the problem.

  1. I was previously only clearing wants. I also need to clear after. Fixing that means the code needs to be higher up so that p_systemd_after still gets honored.
  2. This was not working on any system with more than one pool. The for p in $pools was not splitting the newline-separated pools because we set $IFS to a tab in process_line(). Unsetting IFS after splitting the line fixes that.

With these two changes, var-lib.mount no longer contains After=zfs-import.target or Wants=zfs-import.target.

I have pushed the fix. At this point, I consider this ready to merge. But ideally, someone else should sanity check this.

@aerusso
Copy link
Contributor

aerusso commented Aug 1, 2020

The correct way to do this is to save OIFS="$IFS" before changing IFS, and then reset IFS="$OIFS" afterwards. IFS is changed in is_known and process_line.

I have never seen un-setting IFS said to be a recommended way to get back the default IFS behavior.

@rlaager rlaager force-pushed the fix-dependency-loop branch from 93f6da4 to 00e5e69 Compare August 1, 2020 21:05
@rlaager
Copy link
Member Author

rlaager commented Aug 1, 2020

The correct way to do this is to save OIFS="$IFS" before changing IFS, and then reset IFS="$OIFS" afterwards. IFS is changed in is_known and process_line.

I have never seen un-setting IFS said to be a recommended way to get back the default IFS behavior.

An unset $IFS is supposed to behave as the default:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

However, in practice maybe that is a problem:
https://stackoverflow.com/a/58511401

So I've made the change you suggested.

@rlaager rlaager force-pushed the fix-dependency-loop branch 2 times, most recently from 9ad70d0 to b3b3222 Compare August 1, 2020 22:51
@rlaager rlaager force-pushed the fix-dependency-loop branch from b3b3222 to 352f55a Compare August 2, 2020 02:59
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Signed-off-by: Richard Laager <[email protected]>
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Signed-off-by: Richard Laager <[email protected]>
Closes: openzfs#10356
@rlaager rlaager force-pushed the fix-dependency-loop branch from 352f55a to 1e6ae42 Compare August 18, 2020 05:38
@rlaager
Copy link
Member Author

rlaager commented Aug 18, 2020

@aerusso @InsanePrawn I rebased this. What do you think?

@behlendorf
Copy link
Contributor

@aerusso @InsanePrawn any feedback. How would you like to proceed with this?

@aerusso
Copy link
Contributor

aerusso commented Aug 28, 2020

@behlendorf @rlaager I've tested this on two machines (one zfs on root), and they boot no problem. LGTM---there may still be things to improve in the generator, but this MR is absolutely motion in the right direction.

Copy link
Contributor

@InsanePrawn InsanePrawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM now.

Haven't had time to deploy to my weirdly customized setups yet but I expect no changes.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) Component: Systemd Systemd integration and removed Status: Code Review Needed Ready for review and testing labels Aug 28, 2020
behlendorf pushed a commit that referenced this pull request Aug 28, 2020
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #10356
Closes #10388
behlendorf pushed a commit that referenced this pull request Aug 30, 2020
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #10388
behlendorf pushed a commit that referenced this pull request Aug 30, 2020
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes #10356
Closes #10388
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 22, 2020
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10388
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 22, 2020
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10356
Closes openzfs#10388
OtherJohnGray added a commit to OtherJohnGray/openzfs-docs that referenced this pull request Feb 17, 2021
Hi, I'm not sure if this is correct or not - but this change removes section 4.15, "Patch a dependency loop", as it seems that this is no longer required now that openzfs/zfs#10388 has been merged?

Following the current instructions in 4.15 generates the following console output:

```
root@rescue ~ # curl https://launchpadlibrarian.net/478315221/2150-fix-systemd-dependency-loops.patch | \
>     sed "s|/etc|/lib|;s|\.in$||" | (cd / ; sudo patch -p1)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1487  100  1487    0     0   4260      0 --:--:-- --:--:-- --:--:--  4260
patching file lib/systemd/system-generators/zfs-mount-generator
Hunk openzfs#1 FAILED at 42.
Hunk openzfs#2 FAILED at 62.
Hunk openzfs#3 succeeded at 157 with fuzz 2 (offset 80 lines).
2 out of 3 hunks FAILED -- saving rejects to file lib/systemd/system-generators/zfs-mount-generator.rej
patching file lib/systemd/system/zfs-mount.service
Hunk openzfs#1 FAILED at 6.
1 out of 1 hunk FAILED -- saving rejects to file lib/systemd/system/zfs-mount.service.rej
```

/lib/systemd/system/zfs-mount.service.rej contains the following:

```
root@rescue ~ # cat /lib/systemd/system-generators/zfs-mount-generator.rej
--- lib/systemd/system-generators/zfs-mount-generator
+++ lib/systemd/system-generators/zfs-mount-generator
@@ -42,6 +42,8 @@
   do_fail "zero or three arguments required"
 fi

+pools=$(zpool list -H -o name)
+
 # For ZFSs marked "auto", a dependency is created for local-fs.target. To
 # avoid regressions, this dependency is reduced to "wants" rather than
 # "requires". **THIS MAY CHANGE**
@@ -62,6 +64,7 @@
   set -f
   set -- $1
   dataset="${1}"
+  pool="${dataset%%/*}"
   p_mountpoint="${2}"
   p_canmount="${3}"
   p_atime="${4}"
```

/lib/systemd/system-generators/zfs-mount-generator seems to already contain equivalent code to the rejected hunks, and the accepted hunk seems to generate a code block starting line 160, that seems like a partial and older version of the code block at line 121?
```
121   # If the pool is already imported, zfs-import.target is not needed.  This
122   # avoids a dependency loop on root-on-ZFS systems:
123   # systemd-random-seed.service After (via RequiresMountsFor) var-lib.mount
124   # After zfs-import.target After zfs-import-{cache,scan}.service After
125   # cryptsetup.service After systemd-random-seed.service.
126   #
127   # Pools are newline-separated and may contain spaces in their names.
128   # There is no better portable way to set IFS to just a newline.  Using
129   # $(printf '\n') doesn't work because $(...) strips trailing newlines.
130   IFS="
131 "
132   for p in $pools ; do
133     if [ "$p" = "$pool" ] ; then
134       after=""
135       wants=""
136       break
137     fi
138   done
. . . . . 
160   # If the pool is already imported, zfs-import.target is not needed.  This
161   # avoids a dependency loop on root-on-ZFS systems:
162   # systemd-random-seed.service After (via RequiresMountsFor) var-lib.mount
163   # After zfs-import.target After zfs-import-{cache,scan}.service After
164   # cryptsetup.service After systemd-random-seed.service.
165   for p in $pools ; do
166     if [ "$p" = "$pool" ] ; then
167       wants=""
168       break
169     fi
170   done
```

/lib/systemd/system/zfs-mount.service.rej contains the following:
```
--- lib/systemd/system/zfs-mount.service
+++ lib/systemd/system/zfs-mount.service
@@ -6,7 +6,6 @@
 After=zfs-import.target
 After=systemd-remount-fs.service
 Before=local-fs.target
-Before=systemd-random-seed.service
 After=zfs-load-module.service
 ConditionPathExists=/sys/module/zfs
```
And /lib/systemd/system/zfs-mount.service does not contain the string "Before=systemd-random-seed.service"


...Is it therefore right to remove section 4.15 from this manual?
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10388
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10356
Closes openzfs#10388
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10388
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <[email protected]>
Reviewed-by: InsanePrawn <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#10356
Closes openzfs#10388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Systemd Systemd integration Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants