-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autoreplace not working correctly with ZFS 0.8.x #10577
Comments
shodanshok
added a commit
to shodanshok/zfs
that referenced
this issue
Jul 20, 2020
Due to commit bd813523 a removed device is now explicitly offlined by zed if no spare is available, rather than the letting ZFS detecting it as UNAVAIL. This however broke autoreplacing of whole-disk devices, as decribed here: openzfs#10577 In short: when a new device is reinserted in the same slot, zed will try to ONLINE it without letting ZFS recreate the necessary partition table. The following change simply avoids to OFFLINE the removed device if no spare is available (or if `spare_on_remove` is false). Signed-off-by: Gionatan Danti <[email protected]>
12 tasks
@loli10K @behlendorf Upon futher debug, it turned out that the main issue was not related to device attach, rather to device removal. In short: when a new device is reinserted in the same slot, zed will try to ONLINE it without letting ZFS recreate the necessary partition table. For further info, please give a look here: #10599 |
behlendorf
added a commit
to behlendorf/zfs
that referenced
this issue
Aug 18, 2020
Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as decribed in issue openzfs#10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#10577
12 tasks
tonyhutter
pushed a commit
to tonyhutter/zfs
that referenced
this issue
Sep 16, 2020
Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue openzfs#10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <[email protected]> Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#10577 Closes openzfs#10730
jsai20
pushed a commit
to jsai20/zfs
that referenced
this issue
Mar 30, 2021
Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue openzfs#10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <[email protected]> Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#10577 Closes openzfs#10730
sempervictus
pushed a commit
to sempervictus/zfs
that referenced
this issue
May 31, 2021
Due to commit d48091d a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue openzfs#10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <[email protected]> Co-authored-by: Gionatan Danti <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#10577 Closes openzfs#10730
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information
Describe the problem you're observing
autoreplacing is not working correctly on ZFS 0.8.x: when inserting a new blank disk,
zpool status
show the device asUNAVAIL
.zed
is running andautoreplace=on
. To start the resilver, one as to export/import the pool or manually run azpool replace
. Using WWN/id/sd* names make no difference.Debugging the issue, I found the following zfs events:
ZED debug:
It seems that the problem lies in the missing partition (part1) on the newly inserted disk. In other words, ZED tries to bring the vdev online before creating the required *-part1 block device. In turn ZFS detect the missing partition/block device and report it as
UNAVAIL
.By comparison, these are the zpool events logged on an otherwise identical VM with CentOS 7.7 and ZFS 0.7.13:
As you can see, there are many more events (and vdev_attach sounds as a obvious thing missing from the reported ZFS 0.8.4 events).
Any suggestion?
Thanks.
Describe how to reproduce the problem
zpool status
show the disk asOFFLINE
(it waszed
which offlined the removed disk)zpool status
now show the disk asUNAVAIL
zpool replace
The text was updated successfully, but these errors were encountered: