You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
libzfs_init() should busy-wait on module initialization
libzfs_init()'s JIT load of the module before using it is racy because
Linux kernel module initialization is asynchronous. This causes a
sporadic failure whenever libzfs_init() is required to load the kernel
modules. This happens during the boot process on EPEL systems, Fedora
and likely others such as Ubuntu.
The general mode of failure is that libzfs_init() is expected to load
the module, module initialization does not complete before /dev/zfs is
opened and pool import fails. This could explain the infamous mountall
failure on Ubuntu where pools will import, but things fail to mount.
The general explanation is that the userland process expected to mount
things fails because the module loses the race with libzfs_init(), the
module loads the pools by reading the zpool.cache and nothing mounts
because the userland process expected to perform the mount has already
failed.
A related issue can also manifest itself in initramfs archives that
mount / on ZFS, which affected Gentoo until 2013 when a busy-wait was
implemented to ensure that the module loaded:
https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=c812c35100771bb527f6b03853fa6d8ef66a48fehttps://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=a21728ae287e988a1848435ab27f7ab503def784https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults/initrd.scripts?id=32585f117ffbf6d6a0aa317e6876ae7711a7f307
The busy-wait approach was chosen because it imposed minimal latency and
was implementable in shell code. Unfortunately, it was not known at the
time that libzfs_init() had the same problem, so this went unfixed. It
caused sporadic failures in the flocker tutorial, which caught our
attention gat ClusterHQ:
https://clusterhq.atlassian.net/browse/FLOC-1834
Subsequent analysis following reproduction in a development environment
concluded that the failures were caused by module initialization losing
the race with libzfs_init(). While all Linux kernel modules needed ASAP
during the boot process suffer from this race, the zfs module's
dependence on additional modules make it particularly vulnerable to this
issue. The solution that has been chosen mirrors the solution chosen for
genkernel with the addition of sched_yield() for greater efficiency.
This fails to close the race in the scenario where system execution in a
virtual machine is paused in the exact window necessary to introduce a
delay between a failure and subsequent try greater than the timeout.
Closing the race in that situation would require hooking into udev
and/or the kernel hotplug events. That has been left as a future
improvement because it would require significant development time and it
is quite likely that the busy-wait approach implemented here would be
required for a flawback on exotic systems systems where neither are
available. The chosen approach should be sufficient for achieving
>99.999% reliability.
Closesopenzfs#2556
Signed-off-by: Richard Yao <[email protected]>
Reviewed-by: Turbo Fredriksson <[email protected]>
0 commit comments