Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more control/visibility and speedup spa_load_verify(). #13022

Merged
merged 1 commit into from
Feb 4, 2022

Conversation

amotin
Copy link
Member

@amotin amotin commented Jan 26, 2022

Use error thresholds from the policy to control whether to verify data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about the number of errors and we may skip that part to import pool faster. By default import neither set the data error threshold nor read the error counter. It was only reported to dbgmsg, that is not very useful in everyday life. Metadata are still verified and fail if even single error found.

While there, just for symmetry, return number of metadata errors in case threshold is not set to zero and we haven't reched it.

How Has This Been Tested?

Importing pool on FreeBSD after system crash during active write I see reduction of time spent inside spa_load_verify() from 6.5s to 1.5s due to skipped data verify.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reched it.

Signed-off-by: Alexander Motin <[email protected]>
@amotin amotin requested a review from behlendorf January 26, 2022 21:39
@amotin
Copy link
Member Author

amotin commented Jan 26, 2022

Does anybody know good motivation to scrub last few TXGs during a normal import? Is it more than a time waste?

@amotin amotin added Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Jan 26, 2022
@amotin
Copy link
Member Author

amotin commented Jan 27, 2022

My colleague measured pretty large pool import time after crash during active write with dedup enabled. And he found that disabling metadata verify in that case reduces import time from ~75 seconds to ~20. It is too tasty to ignore. Does anybody know why we do all this scrub and why we do it up to TXG_DEFER_SIZE TXGs back? I don't see any relation to that number at least.

@amotin amotin requested review from ahrens and pzakha January 27, 2022 19:19
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach taken here, I think this makes good sense.

Given the size of many modern pools, I think it probably makes sense to go even farther. For many pools it's completely impractical to scrub the data even during a rewind. That's an operation which could take weeks, or longer. I wouldn't be against changing the default spa_load_verify_data value to B_FALSE. Or if we wanted to be clever, maybe only perform the data scrub on rewind when the pool is all SSD? Though I'm not sure it's worth the complexity.

Copy link
Contributor

@pzakha pzakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. I'm curious, are you changing the policy by doing modifications in libzfs directly, as I do not recall if we can set those via cli?
I'm not sure I'd go as far as turning off spa_load_verify by default, as I do recall it catching errors in some, albeit rare circumstances and then importing the previous txg successfully.

@amotin
Copy link
Member Author

amotin commented Jan 28, 2022

@pzakha At this point I just disabled data scan when the result is not used and completed the policy API. There indeed no user-space part for it now. If somebody have preferences how it could be set on the command line -- please speak up or feel free to take over. I'd prefer it to be done on the command line via using the policy API rather than via global tunables as it is now.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Feb 4, 2022
@behlendorf behlendorf merged commit f2c5bc1 into openzfs:master Feb 4, 2022
amotin added a commit to truenas/zfs that referenced this pull request Feb 22, 2022
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#13022
(cherry picked from commit f2c5bc1)
@amotin amotin deleted the verify_data branch June 29, 2022 17:24
amotin added a commit to amotin/zfs that referenced this pull request Jul 20, 2022
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#13022
behlendorf pushed a commit that referenced this pull request Jul 26, 2022
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes #13022
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#13022
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#13022
geoffamey pushed a commit to BlueArchive/storage-zfs-wasabi that referenced this pull request Sep 27, 2023
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#13022
(cherry picked from commit f2c5bc1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants