Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resilvering continually restarts on read error and cannot offline bad drive #6613

Closed
cousins opened this issue Sep 7, 2017 · 29 comments
Closed
Labels
Status: Stale No recent activity for issue

Comments

@cousins
Copy link

cousins commented Sep 7, 2017

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 6.9
Linux Kernel 2.6.32-696.6.3
Architecture x86_64
ZFS Version 0.6.5.11-1
SPL Version 0.6.5.11-1

I have a pool made up of 60 drives with six 10-disk RAIDZ2s concatenated together. It is on version 0.6.5.11-1 (CentOS 6.9). The last update I did was in July and it is using kmod with the kernel = 2.6.32-696.6.3.el6.x86_64

A drive started getting READ and CKSUM errors during a recent scrub. I usually "zpool offline" the drive, pull it, put a new one in and "zpool replace" it with the help of a script I wrote that deals with drive blinking, multipath, updating vdev_id.conf, udev and unblinking the drive. In this case however, I was away for a while and I wanted to replace it with one that was set as a Hot-Spare. I should have taken the drive offline before doing the replace I think but instead I ran "zpool replace" with the drive that was set up as a Hot-Spare.

It looks like this now:

	  raidz2-3     ONLINE       0     0     0
	    mpathbd    ONLINE       0     0     0
	    mpathbf    ONLINE       0     0     0
	    mpathbg    ONLINE       0     0     0
	    spare-3    ONLINE       0     0    46
	      mpathbx  ONLINE      79     0     6  (resilvering)
	      1-44     ONLINE       0     0     0  (resilvering)
	    0-31       ONLINE       0     0     0
	    mpathl     ONLINE       0     0     0
	    mpathm     ONLINE       0     0     0
	    mpathn     ONLINE       0     0     0
	    mpathas    ONLINE       0     0     0
	    1-31       ONLINE       0     0     0

The mix of mpath?? device names and 1-44 names has to do with this being an earlier system when I was just starting to use /etc/zfs/vdev_id.conf. I should re-import it with the consistent vdev names but my scripts work either way so it hasn't been a high priority.

What is happening is that it will resilver for a day or more and then it will get a READ error on that bad drive (mpathbx) and then the resilvering starts over from the beginning.

So now, what should have taken a long but manageable time, will now possibly never finish. Currently it has gone on for two weeks now.

I have tried to offline and detach mpathbx but get:

[root@nfs1 etc]#  zpool offline pool2 mpathbx
cannot offline mpathbx: no valid replicas

[root@nfs1 etc]#  zpool detach pool2 mpathbx
cannot detach mpathbx: no valid replicas

I thought zed was causing the resilvering to restart so I tried stopping zed but that has not helped. It continues to restart the resilvering process every time it gets a read error on mpathbx.

It seems that there is either a bug here or something simple that I have overlooked. I sent this to the zfs-discuss list but got nothing much. The last suggestion was to just pull mpathbx and the spare and see what happens. As this is a large system (I do have an off site replica of the data) it would take a long time to deal with a mistake and so I'd like to hear what others think.

Any ideas?

Thanks,

Steve

@cousins
Copy link
Author

cousins commented Sep 8, 2017

There seems to be a bug here. Or am I missing something?

Is the answer to just pull both drives?

@cousins
Copy link
Author

cousins commented Sep 12, 2017

@behlendorf I'm sorry to pull you in directly but I'm at a loss right now. I have tried contacting the zfs-discuss group with some advice but the responses didn't show enough knowledge to make me feel comfortable with just pulling drives on a large system that we depend on. So, I have asked for help here. Is there a better place that I should go to try to debug this situation?

Thanks,

Steve

@behlendorf
Copy link
Contributor

@cousins my suggestion would be to pull the mpathbx drive to force it to go offline. This should allow the rebuild to complete successfully to the hot spare device. This is pretty safe since you'll only be down a single device in your raidz2 array, but it assumes your hardware will cleanly handle an active drive being removed. If you're unsure about that you can manually offline the device at the scsi layer instead by running:

echo "offline" >/sys/block/<slave multipath device>/device/state

One the device is offline it should rebuild cleanly as usual. The recent 0.7.x releases make handling this situation a little more straight forward by adding the ability to force fault a device by running zpool offline -f <vdev>.

@cousins
Copy link
Author

cousins commented Sep 12, 2017

Thanks very much @behlendorf . I take it I'd need to do this for each of the four (in my case) multipath paths? "multipath -l" shows this for mpathbx:

mpathbx (35000c50055fe8b5b) dm-74 SEAGATE,ST4000NM0023
size=3.6T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 9:0:31:0  sdhc 133:32  failed undef unknown
|-+- policy='round-robin 0' prio=0 status=active
| `- 9:0:44:0  sdho 133:224 active undef unknown
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 10:0:6:0  sdiy 8:288   active undef unknown
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 10:0:69:0 sdlg 67:480  active undef unknown

I first thought you meant to use dm-74 but there is no /sys/block/dm-74/device directory. The sd?? devices are there but if I offline one(as seen above with sdhc), the other three are still Running.

Do you know if

multipath -f mpathbx

would be equivalent or better? That is what I use after I (usually) "zpool offline ..." devices from the zfs pool.

Thanks again.

@behlendorf
Copy link
Contributor

You'll want to look in /sys/block/dm-74/slaves/ which list the individual component devices. Then you should be able to do something like this to offline them all.

echo "offline" >/sys/block/dm-74/slaves/*/device/state

Sorry, I'm not sure if multipath -f mpathbx would be any better. But setting the individual device to offline will effectively simulate a complete device failure at the ZFS level.

@cousins
Copy link
Author

cousins commented Sep 12, 2017

Thanks. FWIW the command I ended up with to get them all at the same time was:

for i in /sys/block/dm-74/slaves/*/device/state; do echo "offline" > $i; done

It faulted that drive and degraded that raidz2. Resilvering continued without restarting which is great.

So, is/was this a bug? I'm not sure what the problem was with trying to offline the drive after the "zfs replace" command had been issued.

@behlendorf
Copy link
Contributor

@cousins well I definitely think it could have been handled more gracefully, but it's kind of a subtle case. The reason it wouldn't let you detach or offline either device was because it was only considering spare-3 which consisted of two devices as if it was a top-level vdev. Neither drive was fully rebuilt and there were known missing sections, as a top-level vdev taking either drive offline it would have been very bad, hence the no valid replicas error. But obviously this wasn't really a top-level vdev and the data could be reconstructed via RAIDZ. The bug here would be to update the tools so they recognize this case.

@cousins
Copy link
Author

cousins commented Sep 12, 2017

I see. Thanks for the explanation. Would spare-? ever be a top-level vdev? Maybe an exception would key off of that.

My original reason for not offlining mpathbx was that I thought it would be safer and faster because it could get the data directly from that drive, thinking that the READ errors were drive-correctable (since ZFS didn't offline/fault the drive) and ZFS was just reporting them so I'd be aware of them.

If they really are significant errors, shouldn't ZFS have faulted the drive and brought the spare in automatically?

@behlendorf
Copy link
Contributor

If they really are significant errors, shouldn't ZFS have faulted the drive and brought the spare in automatically?

Yes, and I expect the updated ZED logic in 0.7.x would have done exactly this. It would have automatically faulted the drive, the resilver would have carried on, and this would have been a non-event. I'd suggest updating to 0.7.x eventually, but you may want to wait for one or two more point release until a few minor issues get squashed.

@cousins
Copy link
Author

cousins commented Sep 18, 2017

Thanks. I'll keep that in mind.

The resilver finished and now it looks like (after a "zpool clear"):

	  raidz2-3     DEGRADED     0     0     0
	    mpathbd    ONLINE       0     0     0
	    mpathbf    ONLINE       0     0     0
	    mpathbg    ONLINE       0     0     0
	    spare-3    DEGRADED     0     0     0
	      mpathbx  OFFLINE      0     0     0
	      1-44     ONLINE       0     0     0
	    0-31       ONLINE       0     0     0
	    mpathl     ONLINE       0     0     0
	    mpathm     ONLINE       0     0     0
	    mpathn     ONLINE       0     0     0
	    mpathas    ONLINE       0     0     0
	    1-31       ONLINE       0     0     0

I'm not certain why it didn't get rid of spare-3 when done. At this point, I'm planning on replacing 1-44 with a drive that I'll put in the slot that mpathbx was in. At that point I hope that 1-44 will go back to being a "spare" (for the future when using 0.7.x) and the new drive will just be part of raidz2-3 like all of the other drives there. Is that what you expect will happen? If not, is there something else that I should do either now or after to get things back to where they should be?

Thanks,

Steve

@cousins
Copy link
Author

cousins commented Sep 18, 2017

Nevermind:

zpool pool2 detach mpathbx
zpool replace pool2 1-44 0-30

has done the trick. Sorry for the noise.

@bknl
Copy link

bknl commented Oct 17, 2017

I am on a system with v0.7.0-128_gc616dcf8b. A resilver would restart every 25 seconds (without any I/O or read errors forcing that) as long as zed is running. Killing zed allowed the resilver to make forward progress. What patch do I need to be able to use zed together with resilvering ? The git log up to HEAD (17 October 2017) didn't bring anything to mind.

@fatalwarning
Copy link

I have the same issue with the following setup:

  • Debian Stretch with Proxmox VE kernel version 4.13.8-3-pve

  • ZFS on Linux 0.7.3-pve1~bpo9

  • GlusterFS on top of ZFS

  • My zpool status while the problem was running:
    pool: data
    state: DEGRADED
    status: One or more devices is currently being resilvered. The pool will
    continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
    scan: resilver in progress since Wed Dec 13 12:06:38 2017
    126M scanned out of 9,34T at 4,67M/s, 582h2m to go
    25,4M resilvered, 0,00% done
    config:

     NAME                                            STATE     READ WRITE CKSUM
     data                                            DEGRADED     0     0     0
       raidz1-0                                      DEGRADED     0     0     0
         replacing-0                                 DEGRADED     0     0     0
           10131834357385193914                      UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000VN000-1H4167_W300MDYP-part1
           ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6PJ1EAA  ONLINE       0     0     0  (resilvering)
         ata-ST3000VN000-1HJ166_W7300LM8             ONLINE       0     0     0
         ata-ST3000VN000-1HJ166_W7304A8E             ONLINE       0     0     0
         ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N2372432    ONLINE       0     0     0
         ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0H9634K    ONLINE       0     0     0
    

errors: No known data errors

Resilvering never get further the first 250MB scanned.

zpool events -v shows this hundreds of time:
Dec 13 2017 12:15:57.141912480 sysevent.fs.zfs.config_sync
version = 0x0
class = "sysevent.fs.zfs.config_sync"
pool = "data"
pool_guid = 0x88f0a6e452b9caeb
pool_state = 0x0
pool_context = 0x0
time = 0x5a310bed 0x87569a0
eid = 0x35f0

Dec 13 2017 12:16:04.645873127 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "data"
        pool_guid = 0x88f0a6e452b9caeb
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "ragang"
        history_internal_str = "errors=0"
        history_internal_name = "scan aborted, restarting"
        history_txg = 0x122964
        history_time = 0x5a310bf4
        time = 0x5a310bf4 0x267f3de7
        eid = 0x35f1
 
Dec 13 2017 12:16:04.645873127 sysevent.fs.zfs.resilver_start
        version = 0x0
        class = "sysevent.fs.zfs.resilver_start"
        pool = "data"
        pool_guid = 0x88f0a6e452b9caeb
        pool_state = 0x0
        pool_context = 0x0
        time = 0x5a310bf4 0x267f3de7
        eid = 0x35f2
 
Dec 13 2017 12:16:04.649873106 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "data"
        pool_guid = 0x88f0a6e452b9caeb
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "ragang"
        history_internal_str = "func=2 mintxg=3 maxtxg=1175620"
        history_internal_name = "scan setup"
        history_txg = 0x122964
        history_time = 0x5a310bf4
        time = 0x5a310bf4 0x26bc46d2
        eid = 0x35f3

Stopping zed fix the problem. Starting it again don't make it fall back into a loop.

@StefanCohen
Copy link

I had a problem with resilvering continually restarting after a zpool replace. I tried stopping zed as @fatalwarning mentioned and the resilvering continued without problem after that. I could reproduce the problem on a second disk in the same pool. I'm on zfs version 0.7.3-1 (Proxmox 5.1-41).

@felisucoibi
Copy link

How can i stop zed?

@StefanCohen
Copy link

service zfs-zed stop

@felisucoibi
Copy link

In ubuntu 16.04 it said this daemon is not loaded but if i do service zed stop something stops, i hope is the same :D

@rincebrain
Copy link
Contributor

FYI to anyone at home, I just encountered resilver restarting over and over without obvious cause until I stopped zed on 0.7.6.

No read errors, just every 10-15s the resilver would restart and zpool history -i on the pool contains at the end, over and over:

2018-03-16.17:28:12 [txg:670520264] scan aborted, restarting errors=0
2018-03-16.17:28:12 [txg:670520264] scan setup func=2 mintxg=3 maxtxg=670519912
2018-03-16.17:28:33 [txg:670520269] scan aborted, restarting errors=0
2018-03-16.17:28:33 [txg:670520269] scan setup func=2 mintxg=3 maxtxg=670519912

until I shot zed, at which point it starts and does not abort.

@felisucoibi
Copy link

Btw, at the end my problem was writing errors in sda because of usb hub not working properly.

I had luks + zfs anbd at luks level kernel had writing errors because usb not enough power, so i changed the hub and not any more restartings. Maybe can help somebody.

To discover the writing errors, or reading errors i found them in /var/log/kernel.log

ZFS did not gave me any information.

@rincebrain
Copy link
Contributor

In my case, none of my drives threw errors, nothing in kern.log or messages or dmesg or the drives' own SMART counters changing.

@behlendorf behlendorf added this to the 0.8.0 milestone Mar 19, 2018
@behlendorf behlendorf removed this from the 0.8.0 milestone Oct 12, 2018
@Teddy-Schmitz
Copy link

Teddy-Schmitz commented Feb 1, 2019

I got the same behaviour as @rincebrain running 0.7.6 on Fedora 28. As soon as turned off zed the speed of the resilver picked up considerably and it started to make real progress.

@verygreen
Copy link
Contributor

verygreen commented Feb 12, 2019

Well, I was advised not to open a new ticket so I might as well dump what I have here in case there's some interest in actually getting to the root of it. This is on Fedora 29 + 0.8.0-rc3.

I have what appears to be a flaky power connection to one of drives in my zpool, so at times it just powers off. Typically I just shutdown the node, replug the connector, turn it back on and on restart all is fine.

But today I decided to just replug the power live. It worked but understandably(?) the drive was not readded back into the pool automatically.

So I issued a zpool online command and it readded the drive and started resilvering, but the progress was slow and at times going back so upon looking at it closer - it turned out that resilvering was restarting all the time from zero. This is without any read errors or anything like that. Also throughout all this process my kernel log started to be filled with partition table rereading info from the just onlined drive:

...
[64760.364090]  sdb: sdb1 sdb9
[64767.305447]  sdb: sdb1 sdb9
[64774.555451]  sdb: sdb1 sdb9
[64785.461825]  sdb: sdb1 sdb9

Upon advice from irc I stopped zed and both partition table rereading and resilvering restarts stopped. resilvering completed and did not restart.

Of note when rereading was happening it was usually accompanied by two zed messages in /var/log/messages:

Feb 11 20:47:39 fatbox1 kernel: sdb: sdb1 sdb9
Feb 11 20:47:45 fatbox1 zed[1360]: eid=305 class=config_sync pool_guid=0xFF5475C
4B248E2F8
Feb 11 20:47:45 fatbox1 zed[1479]: eid=306 class=config_sync pool_guid=0xFF5475C
4B248E2F8

Restarting the zed did not bring it back to this state. But to perform a controlled experiment, I did zpool offline on the same drive, wrote a bit of data to the pool and then issued the zpool online command.

This restarted the constant partition table rereading, but instead of resilvering, scrub started and was not really progressing much. Stopping zed again stopped the cycle of partition table rereading and made scrub complete somewhat fast (I guess partition table rereading throws all drive caches or some such as all accesses are then slow including zpool status that was taking several seconds).

Running zed -F -v to get the actual events output here's what it told me and this is 100% reproduceable here:

[root@fatbox1 build-and-test]# zed -F -v
Ignoring "zed.rc": not executable by user
Registered zedlet "vdev_clear-led.sh"
Ignoring "zed-functions.sh": not executable by user
Registered zedlet "resilver_finish-notify.sh"
Registered zedlet "pool_import-led.sh"
Registered zedlet "statechange-led.sh"
Registered zedlet "statechange-notify.sh"
Registered zedlet "vdev_attach-led.sh"
Registered zedlet "data-notify.sh"
Registered zedlet "all-syslog.sh"
Registered zedlet "scrub_finish-notify.sh"
Registered zedlet "resilver_finish-start-scrub.sh"
ZFS Event Daemon 0.8.0-rc3 (PID 30159)
Add Agent: init
Diagnosis Engine: register module
Retire Agent: register module
zfs_unavail_pool: examining 'zstorage' (state 7)
zed_disk_event_init
Waiting for new udev disk events...
Processing events since eid=266
Invoking "all-syslog.sh" eid=267 pid=11760
Finished "all-syslog.sh" eid=267 pid=11760 exit=0
Invoking "all-syslog.sh" eid=268 pid=11815
Finished "all-syslog.sh" eid=268 pid=11815 exit=0
Invoking "scrub_finish-notify.sh" eid=268 pid=11817
Finished "scrub_finish-notify.sh" eid=268 pid=11817 exit=3
Invoking "all-syslog.sh" eid=269 pid=11839
Finished "all-syslog.sh" eid=269 pid=11839 exit=0
Invoking "statechange-led.sh" eid=269 pid=11842
Finished "statechange-led.sh" eid=269 pid=11842 exit=1
Invoking "statechange-notify.sh" eid=269 pid=11843
Finished "statechange-notify.sh" eid=269 pid=11843 exit=3
Diagnosis Engine: discarding 'resource.fs.zfs.statechange for vdev 13402720709604261192
Retire Agent: zfs_retire_recv: 'resource.fs.zfs.statechange'
Invoking "all-syslog.sh" eid=270 pid=11896
Finished "all-syslog.sh" eid=270 pid=11896 exit=0
Invoking "all-syslog.sh" eid=271 pid=12077
Finished "all-syslog.sh" eid=271 pid=12077 exit=0
Invoking "statechange-led.sh" eid=271 pid=12081
Finished "statechange-led.sh" eid=271 pid=12081 exit=1
Invoking "statechange-notify.sh" eid=271 pid=12082
Finished "statechange-notify.sh" eid=271 pid=12082 exit=3
Invoking "all-syslog.sh" eid=272 pid=12083
Diagnosis Engine: discarding 'resource.fs.zfs.statechange for vdev 13402720709604261192
Retire Agent: zfs_retire_recv: 'resource.fs.zfs.statechange'
Retire Agent: marking repaired vdev 13402720709604261192 on pool 18398459865380479736
Finished "all-syslog.sh" eid=272 pid=12083 exit=0
zed_disk_event:
	class: EC_dev_status
	subclass: dev_dle
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
zfs_slm_event: EC_dev_status.dev_dle
zfsdle_vdev_online: searching for '13402720709604261192' in 'zstorage'
Invoking "all-syslog.sh" eid=273 pid=12215
Finished "all-syslog.sh" eid=273 pid=12215 exit=0
Invoking "all-syslog.sh" eid=274 pid=12226
Finished "all-syslog.sh" eid=274 pid=12226 exit=0
zed_disk_event:
	class: EC_dev_remove
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
agent post event: ignoring 'EC_dev_remove' for recently expanded device 'ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1'
Invoking "all-syslog.sh" eid=275 pid=12234
Finished "all-syslog.sh" eid=275 pid=12234 exit=0
Invoking "all-syslog.sh" eid=276 pid=12236
Finished "all-syslog.sh" eid=276 pid=12236 exit=0
zed_disk_event:
	class: EC_dev_add
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
zed_disk_event:
	class: EC_dev_status
	subclass: dev_dle
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
Invoking "all-syslog.sh" eid=277 pid=12349
Finished "all-syslog.sh" eid=277 pid=12349 exit=0
Invoking "all-syslog.sh" eid=278 pid=12395
Finished "all-syslog.sh" eid=278 pid=12395 exit=0
Invoking "resilver_finish-notify.sh" eid=278 pid=12397
Finished "resilver_finish-notify.sh" eid=278 pid=12397 exit=2
zfsdle_vdev_online: setting device 'sdb' to ONLINE state in pool 'zstorage': 0
Invoking "resilver_finish-start-scrub.sh" eid=278 pid=12403
zfs_slm_event: EC_dev_add.disk
zfs_deliver_add: adding ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1 (pci-0000:00:1f.2-ata-2) (is_slice 1)
zfs_iter_pool: evaluating vdevs on zstorage (by devid)
  zfs_iter_vdev: matched devid on ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
zfs_process_add: /dev/sdb1 is already healthy, skip it.
zfs_slm_event: EC_dev_status.dev_dle
zfsdle_vdev_online: searching for '13402720709604261192' in 'zstorage'
Finished "resilver_finish-start-scrub.sh" eid=278 pid=12403 exit=2
Invoking "all-syslog.sh" eid=279 pid=12416
Finished "all-syslog.sh" eid=279 pid=12416 exit=0
Invoking "all-syslog.sh" eid=280 pid=12418
Finished "all-syslog.sh" eid=280 pid=12418 exit=0
zed_disk_event:
	class: EC_dev_remove
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
agent post event: ignoring 'EC_dev_remove' for recently expanded device 'ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1'
Invoking "all-syslog.sh" eid=281 pid=12435
Finished "all-syslog.sh" eid=281 pid=12435 exit=0
Invoking "all-syslog.sh" eid=282 pid=12437
Finished "all-syslog.sh" eid=282 pid=12437 exit=0
zed_disk_event:
	class: EC_dev_add
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
zed_disk_event:
	class: EC_dev_status
	subclass: dev_dle
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
Invoking "all-syslog.sh" eid=283 pid=12625
Finished "all-syslog.sh" eid=283 pid=12625 exit=0
zfsdle_vdev_online: setting device 'sdb' to ONLINE state in pool 'zstorage': 0
Invoking "all-syslog.sh" eid=284 pid=12627
zfs_slm_event: EC_dev_add.disk
zfs_deliver_add: adding ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1 (pci-0000:00:1f.2-ata-2) (is_slice 1)
Finished "all-syslog.sh" eid=284 pid=12627 exit=0
zfs_iter_pool: evaluating vdevs on zstorage (by devid)
  zfs_iter_vdev: matched devid on ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
zfs_process_add: /dev/sdb1 is already healthy, skip it.
zfs_slm_event: EC_dev_status.dev_dle
zfsdle_vdev_online: searching for '13402720709604261192' in 'zstorage'
zed_disk_event:
	class: EC_dev_status
	subclass: dev_dle
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
zed_disk_event:
	class: EC_dev_remove
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
agent post event: ignoring 'EC_dev_remove' for recently expanded device 'ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1'
zed_disk_event:
	class: EC_dev_add
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
Invoking "all-syslog.sh" eid=285 pid=2324
zfsdle_vdev_online: setting device 'sdb' to ONLINE state in pool 'zstorage': 0
zfs_slm_event: EC_dev_status.dev_dle
zfsdle_vdev_online: searching for '13402720709604261192' in 'zstorage'
Finished "all-syslog.sh" eid=285 pid=2324 exit=0
Invoking "all-syslog.sh" eid=286 pid=2362
Finished "all-syslog.sh" eid=286 pid=2362 exit=0
zed_disk_event:
	class: EC_dev_status
	subclass: dev_dle
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
zed_disk_event:
	class: EC_dev_remove
	subclass: disk
	dev_name: /dev/sdb1
	path: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
	devid: ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1
	phys_path: pci-0000:00:1f.2-ata-2
	dev_size: 1000194703360
	pool_guid: 18398459865380479736
	vdev_guid: 13402720709604261192
agent post event: ignoring 'EC_dev_remove' for recently expanded device 'ata-Hitachi_HDT721010SLA360_STF607MH1GUHWW-part1'

this continues for like a couple hundred times and then stops (to the contrary of the case wih drive falling off array and resilver happening or I was just not patient enough).

Also in case it matters, here's how my zpool is setup:

[root@fatbox1 ~]# zpool status
  pool: zstorage
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:06:22 with 0 errors on Mon Feb 11 20:52:55 2019
config:

	NAME                                                STATE     READ WRITE CKSUM
	zstorage                                            ONLINE       0     0     0
	  raidz1-0                                          ONLINE       0     0     0
	    sde                                             ONLINE       0     0     0
	    sdd                                             ONLINE       0     0     0
	    sdc                                             ONLINE       0     0     0
	    sdb                                             ONLINE       0     0     0
	    sda                                             ONLINE       0     0     0
	logs	
	  ata-INTEL_SSDSC2KW512G8_PHLA822100ST512DGN-part2  ONLINE       0     0     0

errors: No known data errors

@rincebrain
Copy link
Contributor

@verygreen try disabling zed, does it still do that?

@verygreen
Copy link
Contributor

@rincebrain yeah, it does stop when I disable zed, sorry, just had an incomplete commend posted by mistake.

@caev
Copy link

caev commented Feb 20, 2019

I was able to reproduce this by attaching new disks to two mirror vdevs within a single pool at the same time.

The following workarounds did NOT help:

  • sudo systemctl stop zfs-zed
  • zpool thepool detach newdisk; fdisk newdisk 'g - new label uuid'; zpool thepool attach existingdev newdisk
    (this had other problems, fdisk hanging forever, uncertainty about whether the new label was really accepted)
  • zpool thepool detach newdisk; dd if=/dev/zero of=newdisk; zpool thepool attach existingdev newdisk

The following workaround DID help:

  • 'zpool .. attach ..' both new disks
  • 'zpool .. offline ..' both new disks
  • wait for resilver to finish
  • 'zpool .. online ..' one new disk
  • wait for resilver to finish
  • 'zpool .. online ..' other new disk

All of the "wait" steps above were required. I tried, for example, keeping the first wait step and omitting the second, and the bug was still provoked: the resilver start time kept resetting to $now every 1 - 2min or so.

Most of this is with zfs-zed stopped. For example, I did not try the "DID work" workaround with zfs-zed running. Maybe it would still work.

A connectivity problem is unlikely. There are no warnings about that in dmesg, and no USB nonsense involved.

I'm afraid I don't know the ZFS version or how to get it, but it is from git:
$ pacman -Qs zfs
local/zfs-dkms-git 2019.02.08.r4946.gd8d418ff0-1 (archzfs-dkms-git)
Kernel modules for the Zettabyte File System.
local/zfs-utils-git 2019.02.08.r4946.gd8d418ff0-1 (archzfs-linux-git)
Kernel module support files for the Zettabyte File System.

One thing which may be relevant: pool is at version 22. I noticed there is a resilver_defer feature in 'zpool-features', but it sounded like a performance improvement analogous to single-flighting. In this case the issue is not performance as described by that feature: the resilver is restarting even though no new device addition has provoked one, and there has not been an error, etiher.

@rincebrain
Copy link
Contributor

It doesn't matter whether it was analogous, if the pool is at v22 you don't get any feature flags without upgrading it.

@caev
Copy link

caev commented Feb 21, 2019

Yes, I thought that was implied: I tested with the feature off.

I assume there is regression testing for bugs like this one, but I don't know if those tests run with resilver_defer feature on or off. I speculated that maybe tests need to cover both cases and don't.

The reason it doesn't matter is that the pool is supposed to eventually become healthy whether feature flag is on or off, and it doesn't. The feature setting can't be the cause of the bug. It can at most be related, but it could also be a total red herring.

@thisisnotmyrealname
Copy link

thisisnotmyrealname commented Jul 9, 2019

I have this issue, Trying to replace a bunch of 4TB drives with 8TB drives.
I did 12 drives in one pass without issue, went to do the remaining 4 (which had to be specified with GUID, the WWN wouldn't work, said the drive wasn't in the pool) and the resilver restarts after issuing 225MB to the pool. ZFS 0.8.1 Ubuntu 16.10 on 4.18.0-25-generic
Stopped zed/zfs-zed and no change. no smart errors, tried offlining 2 of the 4 drives in the z3 and it still won't advance past 225MB issued.

The chksum errors appeared after I forcedfully offlined those two disks.
scan: resilver in progress since Tue Jul 9 14:33:54 2019
396G scanned at 56.6G/s, 224M issued at 31.9M/s, 54.4T total
0B resilvered, 0.00% done, 20 days 16:04:10 to go

    NAME                          STATE     READ WRITE CKSUM
    tv                            DEGRADED     0     0     0
      raidz3-0                    DEGRADED     0     0     0
        wwn-0x5000cca261c9c282    ONLINE       0     0     0
        replacing-1               ONLINE       0     0     0
          wwn-0x50014ee2b874c92e  ONLINE       0     0     0
          wwn-0x5000cca261c9bc3d  ONLINE       0     0     0
        wwn-0x5000c500b2e78f6e    ONLINE       0     0     0
        replacing-3               ONLINE       0     0     0
          wwn-0x50014ee2631f030c  ONLINE       0     0     0
          wwn-0x5000cca261c9a642  ONLINE       0     0     0
        replacing-4               DEGRADED     0     0    26
          wwn-0x50014ee2ba3794d8  UNAVAIL      0     0     0
          wwn-0x5000cca261c9079e  ONLINE       0     0     0
        wwn-0x5000c500b5b16651    ONLINE       0     0     0
        wwn-0x5000c500b2e71d11    ONLINE       0     0     0
        wwn-0x5000c500b5b15fdc    ONLINE       0     0     0
        wwn-0x5000c500b5b15d85    ONLINE       0     0     0
        wwn-0x5000c500b2e67d21    ONLINE       0     0     0
        wwn-0x5000c500b4e0ac49    ONLINE       0     0     0
        wwn-0x5000c500b5b05687    ONLINE       0     0     0
        replacing-12              DEGRADED     0     0    25
          wwn-0x50014ee20ba86ab8  UNAVAIL      0     0     0
          wwn-0x5000cca261c9bdc0  ONLINE       0     0     0
        wwn-0x5000cca261c9ad25    ONLINE       0     0     0
        wwn-0x5000cca261c9ac0c    ONLINE       0     0     0
        wwn-0x5000cca261c8662b    ONLINE       0     0     0
    cache
      wwn-0x5002538500204a4f      ONLINE       0     0     0

All zed -F -vvv reports is
Invoking "all-syslog.sh" eid=5131 pid=17656
Finished "all-syslog.sh" eid=5131 pid=17656 exit=0
Invoking "all-syslog.sh" eid=5132 pid=17658
Finished "all-syslog.sh" eid=5132 pid=17658 exit=0
Invoking "all-syslog.sh" eid=5133 pid=18642
Finished "all-syslog.sh" eid=5133 pid=18642 exit=0
Invoking "all-syslog.sh" eid=5134 pid=18644
Finished "all-syslog.sh" eid=5134 pid=18644 exit=0
Invoking "all-syslog.sh" eid=5135 pid=18646
Finished "all-syslog.sh" eid=5135 pid=18646 exit=0
Invoking "all-syslog.sh" eid=5136 pid=18649
Finished "all-syslog.sh" eid=5136 pid=18649 exit=0

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests