Add more control/visibility and speedup spa_load_verify(). #13022

amotin · 2022-01-26T21:39:38Z

Use error thresholds from the policy to control whether to verify data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about the number of errors and we may skip that part to import pool faster. By default import neither set the data error threshold nor read the error counter. It was only reported to dbgmsg, that is not very useful in everyday life. Metadata are still verified and fail if even single error found.

While there, just for symmetry, return number of metadata errors in case threshold is not set to zero and we haven't reched it.

How Has This Been Tested?

Importing pool on FreeBSD after system crash during active write I see reduction of time spent inside spa_load_verify() from 6.5s to 1.5s due to skipped data verify.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reched it. Signed-off-by: Alexander Motin <[email protected]>

amotin · 2022-01-26T21:40:57Z

Does anybody know good motivation to scrub last few TXGs during a normal import? Is it more than a time waste?

amotin · 2022-01-27T19:09:52Z

My colleague measured pretty large pool import time after crash during active write with dedup enabled. And he found that disabling metadata verify in that case reduces import time from ~75 seconds to ~20. It is too tasty to ignore. Does anybody know why we do all this scrub and why we do it up to TXG_DEFER_SIZE TXGs back? I don't see any relation to that number at least.

behlendorf

I like the approach taken here, I think this makes good sense.

Given the size of many modern pools, I think it probably makes sense to go even farther. For many pools it's completely impractical to scrub the data even during a rewind. That's an operation which could take weeks, or longer. I wouldn't be against changing the default spa_load_verify_data value to B_FALSE. Or if we wanted to be clever, maybe only perform the data scrub on rewind when the pool is all SSD? Though I'm not sure it's worth the complexity.

pzakha

This looks reasonable. I'm curious, are you changing the policy by doing modifications in libzfs directly, as I do not recall if we can set those via cli?
I'm not sure I'd go as far as turning off spa_load_verify by default, as I do recall it catching errors in some, albeit rare circumstances and then importing the previous txg successfully.

amotin · 2022-01-28T15:08:45Z

@pzakha At this point I just disabled data scan when the result is not used and completed the policy API. There indeed no user-space part for it now. If somebody have preferences how it could be set on the command line -- please speak up or feel free to take over. I'd prefer it to be done on the command line via using the policy API rather than via global tunables as it is now.

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubbed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reached it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#13022 (cherry picked from commit f2c5bc1)

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubbed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reached it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#13022

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubbed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reached it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #13022

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubbed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reached it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#13022

Use error thresholds from policy to control whether to scrub data and/or metadata. If threshold is set to UINT64_MAX, then caller probably does not care about result and we may skip that part. By default import neither set the data error threshold nor read the error counter, so skip the data scrub for faster import. Metadata are still scrubbed and fail if even single error found. While there just for symmetry return number of metadata errors in case threshold is not set to zero and we haven't reached it. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#13022 (cherry picked from commit f2c5bc1)

amotin requested a review from behlendorf January 26, 2022 21:39

amotin added Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Jan 26, 2022

amotin requested review from ahrens and pzakha January 27, 2022 19:19

behlendorf approved these changes Jan 27, 2022

View reviewed changes

pzakha approved these changes Jan 28, 2022

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Feb 4, 2022

behlendorf merged commit f2c5bc1 into openzfs:master Feb 4, 2022

amotin mentioned this pull request Feb 22, 2022

NAS-113231 / Add more control/visibility to spa_load_verify(). truenas/zfs#45

Merged

amotin deleted the verify_data branch June 29, 2022 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more control/visibility and speedup spa_load_verify(). #13022

Add more control/visibility and speedup spa_load_verify(). #13022

amotin commented Jan 26, 2022

amotin commented Jan 26, 2022

amotin commented Jan 27, 2022

behlendorf left a comment

pzakha left a comment

amotin commented Jan 28, 2022

Add more control/visibility and speedup spa_load_verify(). #13022

Add more control/visibility and speedup spa_load_verify(). #13022

Conversation

amotin commented Jan 26, 2022

How Has This Been Tested?

Types of changes

Checklist:

amotin commented Jan 26, 2022

amotin commented Jan 27, 2022

behlendorf left a comment

Choose a reason for hiding this comment

pzakha left a comment

Choose a reason for hiding this comment

amotin commented Jan 28, 2022