Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Corrective Resilvering Functions #15917

Open
dm17 opened this issue Feb 20, 2024 · 2 comments
Open

ZFS Corrective Resilvering Functions #15917

dm17 opened this issue Feb 20, 2024 · 2 comments
Labels
Type: Feature Feature request or new feature

Comments

@dm17
Copy link

dm17 commented Feb 20, 2024

Describe the feature would like to see added to OpenZFS

Find useful data in faulted or offline devices in order to repair "permanent errors".

For example:

  1. a device in a raidz1 pool is FAULTED
  2. that device is replaced by a good device
  3. during and/or before the resilver completes, another device causes some permanent errors
  4. it seems likely that the FAULTED device could be used to recover at least some of those "permanent errors" unless it is completely toast - right? We should be empowered to force it to try in one way or another, in my view.

I realize there is some necessary threshold for triggering a device FAULT. However, the user should be able to manually put a FAULTED or OFFLINE device into a state like "ex-FAULTED," which is designated for corrective functions only. Resilvering processes take place in the absense of the devices that are being replaced, from ZFS's point-of-view, right? This opens up unnecessary exposure to permanent errors, which could be either prevented or - I'm suggested here - later corrected using the remaining available good sectors of the FAULTED device(s).

If ZFS's default behavior is excluding (FAULTing) known bad devices as quickly as possible, then don't we want to invert that tendency during times when we'd want to reduce the chances of permanent errors?

How will this feature improve OpenZFS?

Reduces and/or repairs permanent errors during/before some resilvering processes. Helps form a complete system of tools and processes for those recovering from broken hardware.

Additional context

This was inspired by a small quantity of permanent errors I experienced during a resilvering process of a raidz2 pool. I could only see 3 ways out:

  1. Corrective receive, which would take days (many TBs over small pipe) to recover this small amount of permanent errors
  2. Minimal corrective receive, which I don't know how to do yet (detailed in Minimal corrective receive stream generation #15916)
  3. As described in this feature request. I had two partially working - yet FAULTed - HDDs that had already been replaced yet a 3rd (non faulted) HDD caused permanent data errors during and/or before that replacement process. I believe the two working drives should have enough good data on them for ZFS to repair that small amount of permanent errors. Realizing that this should be possible made my mind spin with potential data recovery tools one could create with ZFS :)

Providing a decision tree (or flow chart) would be invalauble for aiding many users through optimal recovery paths. ddrescue isn't officially covered in ZFS documentation (though its very useful for some cases), however, I see no way to use a ddrescue'd drive to recover permanent data errors after its replacement in the pool has already commenced. The above ZFS feature could help a user find & use available data to recover from "permanent errors" in mirrors and all kinds of raid-z configurations!

@dm17 dm17 added the Type: Feature Feature request or new feature label Feb 20, 2024
@i3v
Copy link

i3v commented Feb 26, 2024

I'm not a zfs expert (I'm rather just another user in a somewhat similar situation, currently trying to understand my options), but I've seen devs making statements like "ZFS should already automatically try to recover the data using available redundancy and if succeed, issue recovery writes to the corrupted blocks".

I'm not 100% sure if that applies only to the replacing/resilvering situation, though. Maybe that statement was only meant to be about "normal reads from a normally functioning vdev".

This make me think that, if the data is actually there, on the device1 that went to the FAULTED state first, it would be enough to just try to read that specific affected file for zfs to restore it (just after zpool clear pool1 device1 to bring that device back online).

However, zpool replace might have already automatically detached your device1 from the pool. If so, it looks like there's no way to make use of the data on it. Thus it looks like zpool replace with a non-spare disk is much more dangerous than with a spare disk, just because normal zpool replace, for some weird reason, ends up with zpool detach, which is irreversible (and could be just dangerous if there's not enough redundancy).

@rincebrain
Copy link
Contributor

rincebrain commented Feb 26, 2024

From mirrors, ZFS will basically try to read a leg randomly[1], and if that fails, try the other things and write a replacement if it finds another copy or reconstructs it from parity, except probably in replace commands, where if the "old" disk throws a read error, I don't think I expect it to try writing a replacement to the old disk, only the new one, but I've not checked or tested that. (After all, when you're replacing a failing disk, trying to write to bad things might cause it to error out or hang and fall off the bus...)

A "replace" is basically making a temporary mirror vdev with old and the new and doing an attach, wait on that, then detach, except the CLI isn't going to let you manually do that explicitly with raidz/draid leaves. I don't think I expect it to kick the old disk out if it fails to copy everything, but I could be wrong, I don't use raidz for my personal pools atm, and it's been a long time since someone paid me to maintain raidz pools.

So you shouldn't be losing any redundancy by the detach at the end - it should refuse to detach if it didn't write everything from the old disk, I think, and then you might be able to do something like clear the faulted state on the old disk and maybe try a corrective scrub or whatever it got named for "scrub only the blocks in the error list" to trigger it.

And it detaches at the end because the end goal of zpool replace pool diskA diskB is never "keep both diskA and diskB in pool"; if you wanted that, that would be zpool attach, notwithstanding the CLI not letting you do that readily in raidz/draid configurations.

[1] - not actually randomly but don't worry about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

3 participants