-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resilvering continually restarts on read error and cannot offline bad drive #6613
Comments
There seems to be a bug here. Or am I missing something? Is the answer to just pull both drives? |
@behlendorf I'm sorry to pull you in directly but I'm at a loss right now. I have tried contacting the zfs-discuss group with some advice but the responses didn't show enough knowledge to make me feel comfortable with just pulling drives on a large system that we depend on. So, I have asked for help here. Is there a better place that I should go to try to debug this situation? Thanks, Steve |
@cousins my suggestion would be to pull the
One the device is offline it should rebuild cleanly as usual. The recent 0.7.x releases make handling this situation a little more straight forward by adding the ability to force fault a device by running |
Thanks very much @behlendorf . I take it I'd need to do this for each of the four (in my case) multipath paths? "multipath -l" shows this for mpathbx:
I first thought you meant to use dm-74 but there is no /sys/block/dm-74/device directory. The sd?? devices are there but if I offline one(as seen above with sdhc), the other three are still Running. Do you know if
would be equivalent or better? That is what I use after I (usually) "zpool offline ..." devices from the zfs pool. Thanks again. |
You'll want to look in
Sorry, I'm not sure if |
Thanks. FWIW the command I ended up with to get them all at the same time was:
It faulted that drive and degraded that raidz2. Resilvering continued without restarting which is great. So, is/was this a bug? I'm not sure what the problem was with trying to offline the drive after the "zfs replace" command had been issued. |
@cousins well I definitely think it could have been handled more gracefully, but it's kind of a subtle case. The reason it wouldn't let you detach or offline either device was because it was only considering |
I see. Thanks for the explanation. Would spare-? ever be a top-level vdev? Maybe an exception would key off of that. My original reason for not offlining mpathbx was that I thought it would be safer and faster because it could get the data directly from that drive, thinking that the READ errors were drive-correctable (since ZFS didn't offline/fault the drive) and ZFS was just reporting them so I'd be aware of them. If they really are significant errors, shouldn't ZFS have faulted the drive and brought the spare in automatically? |
Yes, and I expect the updated ZED logic in 0.7.x would have done exactly this. It would have automatically faulted the drive, the resilver would have carried on, and this would have been a non-event. I'd suggest updating to 0.7.x eventually, but you may want to wait for one or two more point release until a few minor issues get squashed. |
Thanks. I'll keep that in mind. The resilver finished and now it looks like (after a "zpool clear"):
I'm not certain why it didn't get rid of spare-3 when done. At this point, I'm planning on replacing 1-44 with a drive that I'll put in the slot that mpathbx was in. At that point I hope that 1-44 will go back to being a "spare" (for the future when using 0.7.x) and the new drive will just be part of raidz2-3 like all of the other drives there. Is that what you expect will happen? If not, is there something else that I should do either now or after to get things back to where they should be? Thanks, Steve |
Nevermind:
has done the trick. Sorry for the noise. |
I am on a system with v0.7.0-128_gc616dcf8b. A resilver would restart every 25 seconds (without any I/O or read errors forcing that) as long as zed is running. Killing zed allowed the resilver to make forward progress. What patch do I need to be able to use zed together with resilvering ? The git log up to HEAD (17 October 2017) didn't bring anything to mind. |
I have the same issue with the following setup:
errors: No known data errors Resilvering never get further the first 250MB scanned. zpool events -v shows this hundreds of time:
Stopping zed fix the problem. Starting it again don't make it fall back into a loop. |
I had a problem with resilvering continually restarting after a zpool replace. I tried stopping zed as @fatalwarning mentioned and the resilvering continued without problem after that. I could reproduce the problem on a second disk in the same pool. I'm on zfs version 0.7.3-1 (Proxmox 5.1-41). |
How can i stop zed? |
service zfs-zed stop |
In ubuntu 16.04 it said this daemon is not loaded but if i do service zed stop something stops, i hope is the same :D |
FYI to anyone at home, I just encountered resilver restarting over and over without obvious cause until I stopped zed on 0.7.6. No read errors, just every 10-15s the resilver would restart and zpool history -i on the pool contains at the end, over and over:
until I shot zed, at which point it starts and does not abort. |
Btw, at the end my problem was writing errors in sda because of usb hub not working properly. I had luks + zfs anbd at luks level kernel had writing errors because usb not enough power, so i changed the hub and not any more restartings. Maybe can help somebody. To discover the writing errors, or reading errors i found them in /var/log/kernel.log ZFS did not gave me any information. |
In my case, none of my drives threw errors, nothing in kern.log or messages or dmesg or the drives' own SMART counters changing. |
I got the same behaviour as @rincebrain running 0.7.6 on Fedora 28. As soon as turned off zed the speed of the resilver picked up considerably and it started to make real progress. |
Well, I was advised not to open a new ticket so I might as well dump what I have here in case there's some interest in actually getting to the root of it. This is on Fedora 29 + 0.8.0-rc3. I have what appears to be a flaky power connection to one of drives in my zpool, so at times it just powers off. Typically I just shutdown the node, replug the connector, turn it back on and on restart all is fine. But today I decided to just replug the power live. It worked but understandably(?) the drive was not readded back into the pool automatically. So I issued a zpool online command and it readded the drive and started resilvering, but the progress was slow and at times going back so upon looking at it closer - it turned out that resilvering was restarting all the time from zero. This is without any read errors or anything like that. Also throughout all this process my kernel log started to be filled with partition table rereading info from the just onlined drive:
Upon advice from irc I stopped zed and both partition table rereading and resilvering restarts stopped. resilvering completed and did not restart. Of note when rereading was happening it was usually accompanied by two zed messages in /var/log/messages:
Restarting the zed did not bring it back to this state. But to perform a controlled experiment, I did zpool offline on the same drive, wrote a bit of data to the pool and then issued the zpool online command. This restarted the constant partition table rereading, but instead of resilvering, scrub started and was not really progressing much. Stopping zed again stopped the cycle of partition table rereading and made scrub complete somewhat fast (I guess partition table rereading throws all drive caches or some such as all accesses are then slow including zpool status that was taking several seconds). Running zed -F -v to get the actual events output here's what it told me and this is 100% reproduceable here:
this continues for like a couple hundred times and then stops (to the contrary of the case wih drive falling off array and resilver happening or I was just not patient enough). Also in case it matters, here's how my zpool is setup:
|
@verygreen try disabling zed, does it still do that? |
@rincebrain yeah, it does stop when I disable zed, sorry, just had an incomplete commend posted by mistake. |
I was able to reproduce this by attaching new disks to two mirror vdevs within a single pool at the same time. The following workarounds did NOT help:
The following workaround DID help:
All of the "wait" steps above were required. I tried, for example, keeping the first wait step and omitting the second, and the bug was still provoked: the resilver start time kept resetting to $now every 1 - 2min or so. Most of this is with zfs-zed stopped. For example, I did not try the "DID work" workaround with zfs-zed running. Maybe it would still work. A connectivity problem is unlikely. There are no warnings about that in dmesg, and no USB nonsense involved. I'm afraid I don't know the ZFS version or how to get it, but it is from git: One thing which may be relevant: pool is at version 22. I noticed there is a resilver_defer feature in 'zpool-features', but it sounded like a performance improvement analogous to single-flighting. In this case the issue is not performance as described by that feature: the resilver is restarting even though no new device addition has provoked one, and there has not been an error, etiher. |
It doesn't matter whether it was analogous, if the pool is at v22 you don't get any feature flags without upgrading it. |
Yes, I thought that was implied: I tested with the feature off. I assume there is regression testing for bugs like this one, but I don't know if those tests run with resilver_defer feature on or off. I speculated that maybe tests need to cover both cases and don't. The reason it doesn't matter is that the pool is supposed to eventually become healthy whether feature flag is on or off, and it doesn't. The feature setting can't be the cause of the bug. It can at most be related, but it could also be a total red herring. |
I have this issue, Trying to replace a bunch of 4TB drives with 8TB drives. The chksum errors appeared after I forcedfully offlined those two disks.
All zed -F -vvv reports is |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
I have a pool made up of 60 drives with six 10-disk RAIDZ2s concatenated together. It is on version 0.6.5.11-1 (CentOS 6.9). The last update I did was in July and it is using kmod with the kernel = 2.6.32-696.6.3.el6.x86_64
A drive started getting READ and CKSUM errors during a recent scrub. I usually "zpool offline" the drive, pull it, put a new one in and "zpool replace" it with the help of a script I wrote that deals with drive blinking, multipath, updating vdev_id.conf, udev and unblinking the drive. In this case however, I was away for a while and I wanted to replace it with one that was set as a Hot-Spare. I should have taken the drive offline before doing the replace I think but instead I ran "zpool replace" with the drive that was set up as a Hot-Spare.
It looks like this now:
The mix of mpath?? device names and 1-44 names has to do with this being an earlier system when I was just starting to use /etc/zfs/vdev_id.conf. I should re-import it with the consistent vdev names but my scripts work either way so it hasn't been a high priority.
What is happening is that it will resilver for a day or more and then it will get a READ error on that bad drive (mpathbx) and then the resilvering starts over from the beginning.
So now, what should have taken a long but manageable time, will now possibly never finish. Currently it has gone on for two weeks now.
I have tried to offline and detach mpathbx but get:
I thought zed was causing the resilvering to restart so I tried stopping zed but that has not helped. It continues to restart the resilvering process every time it gets a read error on mpathbx.
It seems that there is either a bug here or something simple that I have overlooked. I sent this to the zfs-discuss list but got nothing much. The last suggestion was to just pull mpathbx and the spare and see what happens. As this is a large system (I do have an off site replica of the data) it would take a long time to deal with a mistake and so I'd like to hear what others think.
Any ideas?
Thanks,
Steve
The text was updated successfully, but these errors were encountered: