-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: Fix zfs_prune panics #16770
Linux: Fix zfs_prune panics #16770
Conversation
@behlendorf Any chance of getting this reviewed for #16760 2.3.0-rc4? |
38893b6
to
649c441
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The free_inode change makes sense, thanks for spotting it. Could you just move it in to it's own PR.
|
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Signed-off-by: Pavel Snajdr <[email protected]>
@tonyhutter Maybe it'd be good to have this fix in 2.2.7? |
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
by protecting against sb->s_shrink eviction on umount with newer kernels deactivate_locked_super calls shrinker_free and only then sops->kill_sb cb, resulting in UAF on umount when trying to reach for the shrinker functions in zpl_prune_sb of in-umount dataset Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Pavel Snajdr <[email protected]> Closes openzfs#16770
This PR may be the culprit of an issue we observed on one of our servers. When trying to destroy a dataset, it failed with 'dataset busy':
The dataset was unmounted, when trying to mount it by 'zfs mount {zfs dataset}', it's completely stuck with the following stack trace:
where it got stuck here: Observing the super block of the dataset, it has s_active 0:
The s_flags = SB_ACTIVE | SB_BORN | SB_POSIXACL | SB_NOATIME. It seems when messing with sb->s_active in zpl_prune_sb() and a 'zfs destroy' is in progress, we can get into this weird situation. A normal cleanup procedure would be cleanup_mnt() -> deactivate_super() -> deactivate_locked_super() -> zpl_kill_sb(), sb->s_active is atomically detect-and-changed in deactivate_super() and deactivate_locked_super(), theoretically zpl_prune_sb() could get in the way and alter the normal procedure by messing with the sb->s_active counter. I am working on reproducing the issue but it doesn't seem to be easy - it has to be a perfect timing for it to happen. I apologize if it's inappropriate to post in a closed PR. |
I was able to reproduce the issue by simulating what zfs_prune_cb() does in a dummy driver. Here is the code snippet:
@snajpa - may I have you attention? it seems this PR fixes one issue but introduces another. |
@youzhongyang that is weird, I used |
@youzhongyang can you try to change the condition in
to
and try to reproduce with that? this was a thing I was unsure if it was worth doing (or making an effort if that didn't work to make it work, can't remember), maybe skipping it altogether wasn't the best of ideas, though it didn't show up in any testing nor in our prod so far |
@snajpa I added (sb->s_flags & SB_ACTIVE), it's also reproducible. |
Here is my explanation: In Linux kernel, the deactivate_super() and deactivate_locked_super() are as follows:
deactivate_super(): deactivate_locked_super(): If an outsider (such as zpl_prune_sb()) tries to manipulate sb->s_active counter, |
@youzhongyang well if you have any ideas how to fix it, please go ahead and try :) if not, I should be able to get to this in about a week +- we'll probably need to figure out yet another way how to avoid the shrinkers going away from under |
@snajpa - I thought about it but haven't been able to figure out a better way. Relying on the info provided by super_block seems inadequate. We never had panic caused by zfs_prune so we reverted this commit for our servers. Please go ahead try fixing it. |
@youzhongyang if you have some time to spare, can I ask you to try to reproduce with this patch? diff --git a/module/os/linux/zfs/zpl_super.c b/module/os/linux/zfs/zpl_super.c
index e27e9fd02..3bc2e587e 100644
--- a/module/os/linux/zfs/zpl_super.c
+++ b/module/os/linux/zfs/zpl_super.c
@@ -385,17 +385,15 @@ zpl_prune_sb(uint64_t nr_to_scan, void *arg)
int objects = 0;
/*
- * deactivate_locked_super calls shrinker_free and only then
- * sops->kill_sb cb, resulting in UAF on umount when trying to reach
- * for the shrinker functions in zpl_prune_sb of in-umount dataset.
- * Increment if s_active is not zero, but don't prune if it is -
- * umount could be underway.
+ * Ensure the superblock is not in the process of being torn down.
*/
- if (atomic_inc_not_zero(&sb->s_active)) {
- (void) -zfs_prune(sb, nr_to_scan, &objects);
- atomic_dec(&sb->s_active);
+ if (down_read_trylock(&sb->s_umount)) {
+ if (!(sb->s_flags & SB_DYING) && sb->s_root &&
+ (sb->s_flags & SB_BORN)) {
+ (void) -zfs_prune(sb, nr_to_scan, &objects);
+ }
+ up_read(&sb->s_umount);
}
-
}
const struct super_operations zpl_super_operations = { |
@snajpa - I tested your patch, the issue is no longer reproducible. Thanks. |
@youzhongyang awesome, thank you very much! I'm going to make a PR with that patch now, thanks again! for all your time spent on this, very much appreciated! |
Motivation and Context
Linux: Fix zfs_prune panics:
#16324
Description
Linux: Fix zfs_prune panics
How Has This Been Tested?
Low memory scenario docker pull with zfs as storage backend, now passes. Template build at vpsFree, also passes.
Types of changes
Checklist:
Signed-off-by
.