Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zvol_disk_open() may spin on CPU #15658

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

zvol_disk_open() may spin on CPU #15658

wants to merge 1 commit into from

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented Dec 10, 2023

Motivation and Context

zvol_disk_open() waits for up to zfs_vdev_open_timeout_ms (1 second by default) (e.g. if the block device does not exist). While in this loop, it calls schedule_timeout().

Description

The problem is that schedule_timeout() may not actually cause the thread to go off-CPU. Per the "documentation" (comment in the source code):

 * The function behavior depends on the current task state:
 * %TASK_RUNNING - the scheduler is called, but the task does not sleep
 * at all. That happens because sched_submit_work() does nothing for
 * tasks in %TASK_RUNNING state.

In my experience, schedule_timeout() never sleeps from this code path. This is especially noticeable if zfs_vdev_open_timeout_ms has been increased from its default.

This commit uses msleep() to actually sleep. Note that this is how it was before #7629.

How Has This Been Tested?

increasing zfs_vdev_open_timeout_ms and then importing a pool whose disks are no longer present on the system (/dev/... links no longer exist)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

`zvol_disk_open()` waits for up to `zfs_vdev_open_timeout_ms` (1 second by
default) (e.g. if the block device does not exist).  While in this loop,
it calls `schedule_timeout()`.

The problem is that `schedule_timeout()` may not actually cause the
thread to go off-CPU.  Per the "documentation" (comment in the source
code):
```
 * The function behavior depends on the current task state:
 * %TASK_RUNNING - the scheduler is called, but the task does not sleep
 * at all. That happens because sched_submit_work() does nothing for
 * tasks in %TASK_RUNNING state.
```

In my experience, `schedule_timeout()` never sleeps from this code path.
This is especially noticeable if `zfs_vdev_open_timeout_ms` has been
increased from its default.

This commit uses `msleep()` to actually sleep.  Note that this is how it
was before openzfs#7629.

Signed-off-by: Matthew Ahrens <[email protected]>
@ahrens ahrens requested a review from behlendorf December 10, 2023 04:42
@ahrens
Copy link
Member Author

ahrens commented Dec 10, 2023

@behlendorf do you know why it was changed from msleep() to schedule_timeout() in #7629? Will changing it back cause any problems?

@ahrens
Copy link
Member Author

ahrens commented Dec 11, 2023

Should this use delay() instead (which uses schedule_timeout_uninterruptible())?

@behlendorf
Copy link
Contributor

I don't recall why this was changed. My hunch would be to reduce the number of kernel interfaces we depend on. I see msleep() isn't used anywhere else in the Linux code. It's unlikely msleep() would change, but you never know.

If we do want to use msleep() here we'll want to add a ./configure test for it. My preference would be to use the native schedule_timeout_uninterruptible() interface here. This is in the Linux specific code so that's not an issue, it avoids adding a new dependency, and it makes it crystal clear how we intended to sleep.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants