Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait in libzfs_init() for the /dev/zfs device #3431

Closed
wants to merge 2 commits into from

Conversation

behlendorf
Copy link
Contributor

While module loading itself is synchronous the creation of the /dev/zfs
device is not. This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs. This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created. This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed. The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently. If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables. This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior. By default module auto-loading is now disabled.

  • ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
  • ZFS_MODULE_TIMEOUT="" - Seconds to wait for /dev/zfs

The additional small changes were also made.

  • In libzfs_run_process() 'rc' variables was renamed to 'error' for
    consistency with the rest of the code base.
  • All fprintf() error messages were moved out of the libzfs_init()
    library function where they never belonged in the first place. A
    libzfs_error_init() function was added to provide useful error
    messages for the most common causes of failure.
  • The zfs-import-* systemd service files have been updated to call
    '/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
    behavior.

Signed-off-by: Brian Behlendorf [email protected]
Issue #2556

return (0);
} else if (errno == ENOENT) {
if (NSEC2MSEC(gethrtime() - start) > busy_timeout)
usleep(10 * MILLISEC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an else sched_yield(); here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually intentionally dropped it because we're only going to be spinning briefly. But I can add it back, it does appear to be POSIX so there aren't compatibility concerns.

@ryao
Copy link
Contributor

ryao commented May 19, 2015

There is a pre-existing issue where zpool import in the systemd unit files will invoke a label update rather than doing a verbatim import. We should probably use zpool import -F there while we are touching it to fix that.

@ryao
Copy link
Contributor

ryao commented May 19, 2015

A fix for this issue is something that we would want to backport to 0.6.4, but the library changes and behavior changes are not something that are good to backport. Lets split it into two patches. One implements things without restructuring layering/functionality and another that does. Then we can backport the first patch.

@behlendorf
Copy link
Contributor Author

That's amusing the test failures are due to parts of the test suite assuming the auto-loading behavior. When I refresh this I'll update the offending scripts accordingly.

systemd unit files will invoke a label update rather than doing a verbatim import.

Do you have the issue number? I'd like to see the full bug. We should not need to use -F when importing the pool under normal circumstances. If we do change that behavior I'd definitely keep it as a separate patch.

@ryao this doesn't strike me as a critical issue worth applying to a point release.

@behlendorf
Copy link
Contributor Author

Refreshed:

  • Added a call to sched_yield() when busy-waiting.
  • Correctly handle the case when ZFS_MODULE_TIMEOUT=0

ASSERT(g_zfs != NULL);
if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s", libzfs_error_init(errno));
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a libzfs_init() failure here be fatal, especially since you've removed the ASSERT?

Now if libzfs_init() fails, you return early from import_pool() . That function returns a void, so you can't return the error to the caller (zhack_spa_open()) which obliviously rumbles on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yes it should be. I'll fix it by reverting this hunk so it matches upstream. I clearly just replaced all this calls without fully inspecting them.

@behlendorf
Copy link
Contributor Author

Refreshed with the zhack chunk dropped.

timeout_str = getenv("ZFS_MODULE_TIMEOUT");
if (timeout_str) {
timeout = strtol(timeout_str, NULL, 0);
timeout = MIN(MAX(timeout, (10 * 60)), 0); /* 0 <= N <= 600 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout here is always 0. I think you mean to swap MIN and MAX:

timeout = MAX(MIN(timeout, (10 * 60)), 0); /* 0 <= N <= 600 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's embarrassing, and exactly what I deserve for rushing this small tweak to the original patch. Originally it didn't have a minimum bound.

@ryao
Copy link
Contributor

ryao commented May 20, 2015

@behlendorf There is no issue number for the misuse of zpool import in the start scripts (yet). I noticed it when debugging this problem on Friday. When the pool is imported by the module reading zpool.cache, a verbatim import is done, but when the script does it, a non-verbatim import is done. This poses problems for txg rollback if something goes wrong and is only necessary when a change is done to the pool vdev configuration.

My employer considers backporting a fix for this to be important because this issue affects the flocker tutorial in anywhere from 1/3 to 2/3 of all runs. I spent yesterday backporting my original patch to 0.6.4.1 (as 0.6.4.1.flocker.0). I did it in a manner roughly equivalent to what you do when you tag releases, such that there are source tarballs that can be used to build packages using the generic rpm/deb instructions. They have autotools products included, so no ./autogen.sh is necessary. I pushed the branches to a public git repository earlier today:

https://github.com/ClusterHQ/spl/tree/flocker-backports
https://github.com/ClusterHQ/zfs/tree/flocker-backports

The tarballs are available at a temporary location. If anyone wants them, I can provide the link, but I am going to avoid posting the temporary link to the issue tracker so that people in the future do not encounter a broken link. If anyone asks for the tarballs, I'll post the official link after it is up.

@behlendorf
Copy link
Contributor Author

Refreshed to include the restructured do-while loop and the MIN/MAX inversion.

I've also added an else clause to the ZFS_MODULE_LOADING environment variable handling. This way the patch can be cherry picked in to the 0.6.4-release without changing the default behavior by just inverting the default value for load in libzfs_load_module(). This should address @ryao's concerns about backporting this to the release version and we can apply it to 0.6.4.2.

@dun
Copy link
Contributor

dun commented May 20, 2015

Looks good to me.

@@ -9,4 +9,4 @@ ConditionPathExists=!@sysconfdir@/zfs/zpool.cache
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=@sbindir@/zpool import -d /dev/disk/by-id -aN
ExecStart=/sbin/modprobe zfs && @sbindir@/zpool import -d /dev/disk/by-id -aN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a -F here to match the behavior when the module auto-imports the pool itself to fix #3434.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than sneak that in here. Let's tackle that in a separate patch because why that's desirable is going to need a fair bit of explanation and it would be a change in existing behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine.

@ryao
Copy link
Contributor

ryao commented May 20, 2015

@behlendorf I am not a fan of backporting the changes to the libzfs API because they ideally shouldn't change across point releases, but it is your call. You will have no further complaints from me as long as you are aware of that concern.

As a side note, issues with library API changes are not an issue for Gentoo because Gentoo's packaging keeps the userland binaries in a single package. It is more likely to be an issue for other distributions because their packaging breaks the tools and libraries into discrete packages.

@ryao
Copy link
Contributor

ryao commented May 20, 2015

@behlendorf Those comments aside, this looks good to me.

@behlendorf
Copy link
Contributor Author

I because they ideally shouldn't change across point releases,

That's a very good point. This patch does remove one public function and add another which could potentially cause breakage. I suspect the odds of this causing breakage are quite low but it's definitely possible. I've split this patch as you originally suggested in to two parts, we'll only cherry pick the first part to the release branch to avoid any API changes. Thanks for keeping me honest about this.

While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2556
All fprintf() error messages are moved out of the libzfs_init()
library function where they never belonged in the first place.  A
libzfs_error_init() function is added to provide useful error
messages for the most common causes of failure.

Additionally, in libzfs_run_process() the 'rc' variable was renamed
to 'error' for consistency with the rest of the code base.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf
Copy link
Contributor Author

The failures here were due to the test suite needing an update to explicitly load the zfs modules for ztest. I'll update the tests and get this merged today.

@behlendorf
Copy link
Contributor Author

Merged as:

65037d9 Add libzfs_error_init() function
87abfcb Wait in libzfs_init() for the /dev/zfs device

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants