Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with incremental replication on ZFS 2.0.5 - cannot receive: invalid record type #12476

Closed
shodanshok opened this issue Aug 13, 2021 · 11 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@shodanshok
Copy link
Contributor

shodanshok commented Aug 13, 2021

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.9
Kernel Version 3.10.0-1160.31.1.el7.x86_64
Architecture x86_64
OpenZFS Version 2.0.5

Describe the problem you're observing

After upgrading source & destination machine to ZFS 2.0.5, sometime I can not receive the last incremental snapshot due to the following error:

[root@nas ~]# cat sendI.out | zfs recv -v tank/kvm
receiving incremental stream of tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02 into tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
snap tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02 already exists; ignoring
cannot receive: invalid record type
cannot restore to tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02: destination already exists

The cannot receive: invalid record type seems strange: from here one can see that this error should be returned only when a unknow record type is processed, but with zstreamdump I can't see anything strange:

[root@nas ~]# cat sendI.out | zstreamdump
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@autosnap_2021-08-13_09:00:01_hourly
        payloadlen = 0
END checksum = 48b5c5d11/11d3a102010/240696f57663/31ffbdcb0d618
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 6115fc52
        type = 2
        flags = 0xc
        toguid = c9860a231e0c273e
        fromguid = 3e54c8ff6210d198
        toname = tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
        payloadlen = 0
END checksum = 320c74be0b095/d3d1940148faf2bf/d54ea9e67b9f2835/34da481d7230cdbd
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 61160a61
        type = 2
        flags = 0xc
        toguid = 81d457ea9b564a62
        fromguid = c9860a231e0c273e
        toname = tank/kvm@autosnap_2021-08-13_08:00:01_hourly
        payloadlen = 0
END checksum = 2f3a8eb0ddd23bb/b5af1cc96ba19238/bdc078eee7e98de0/5f1cb642d243f317
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 61161871
        type = 2
        flags = 0xc
        toguid = 6abeb2c3ed523590
        fromguid = 81d457ea9b564a62
        toname = tank/kvm@autosnap_2021-08-13_09:00:01_hourly
        payloadlen = 0
END checksum = 39a67322b9c786f/1538fafeb55604f3/e7f57388908e8ae9/eed307bd0a90f5f9
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 4 (0 bytes)
        Total DRR_END records = 5 (0 bytes)
        Total DRR_OBJECT records = 674 (189616 bytes)
        Total DRR_FREEOBJECTS records = 61 (0 bytes)
        Total DRR_WRITE records = 18320 (2401239040 bytes)
        Total DRR_WRITE_BYREF records = 0 (0 bytes)
        Total DRR_WRITE_EMBEDDED records = 0 (0 bytes)
        Total DRR_FREE records = 1665 (0 bytes)
        Total DRR_SPILL records = 0 (0 bytes)
        Total records = 20729
        Total payload size = 2401428656 (0x8f22e4b0)
        Total header overhead = 6467448 (0x62af78)
        Total stream length = 2407896104 (0x8f859428)

Try to forcing with -F does not produce better results:

[root@nas ~]# zfs rollback tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
[root@nas ~]# cat sendI.out | zfs recv -F -v tank/kvm
receiving incremental stream of tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02 into tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
snap tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02 already exists; ignoring
cannot receive: invalid record type
cannot restore to tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02: destination already exists

I also tried restoring an incremental stream generated with -i (ie: not using a replication stream). Without forcing anything it also fails, but no reference to an invalid record exists:

[root@nas ~]# zfs rollback tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
[root@nas ~]# cat sendi.out | zfs recv -v tank/kvm
receiving incremental stream of tank/kvm@autosnap_2021-08-13_09:00:01_hourly into tank/kvm@autosnap_2021-08-13_09:00:01_hourly
cannot receive incremental stream: destination tank/kvm has been modified
since most recent snapshot

Forcing the very same -i stream does works indeed:

[root@nas ~]# zfs rollback tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02
[root@nas ~]# cat sendi.out | zfs recv -F -v tank/kvm
receiving incremental stream of tank/kvm@autosnap_2021-08-13_09:00:01_hourly into tank/kvm@autosnap_2021-08-13_09:00:01_hourly
received 1.64G stream in 40 seconds (42.1M/sec)

It seems as if the destination dataset was modified; however, it is in readonly mode and no files were deleted (so ZFS delete queue should be empty). Moreover, I never encountered the invalid record type issue before, and I don't know why a forced receive (-F) does work with an incremental send (-i) but not for an incremental+replication stream (-I).

Describe how to reproduce the problem

No specific reproducer; replication fails each 7-10 days.

Include any warning/errors/backtraces from the system logs

None relevant in system logs.

@shodanshok shodanshok added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 13, 2021
@shodanshok
Copy link
Contributor Author

I partially tracked down the issue to two snapshots having the same creation time and confusing my replication application (syncoid), which tried to re-transfer and already-sent snapshot. It means I can simply reproduce the issue by re-sending an already-sent snapshot.

However, sending for a second time the same snapshot should cause no issue, as zfs recv should simply discard the already-present snapshot (ie: snap tank/kvm@syncoid_source.localdomain_2021-08-13:07:00:02 already exists; ignoring). I am going to spin two virtual machines to further debug the issue, incorporating @scineram suggestion here

@shodanshok
Copy link
Contributor Author

shodanshok commented Aug 13, 2021

After having patched recv_skip as suggested by @scineram, this is the zfs recv output:

cannot receive: invalid record type: 0x3100762

It is my understanding that drr_type is an ENUM with only an handful of types, so the reported value sounds very strange and let me think that something is not parsed correctly. Strange enough, I can not replicated the issue on the virtual machines I spun up - it seems to only affect a specific physical machine. Trying the userspace component (ie: zfs binary) bundled with ZFS 2.1.0 does not change anything.

Any clue?

@shodanshok
Copy link
Contributor Author

I continued investigating the issue. I tried to:

  • zfs send and zfs recv directly on the target machine - ie: sending some snapshots to a file and receiving them on the same machine. Result is the same cannot receive: invalid record type: 0x3100762 issue.
  • zfs send and zfs recv of a newly-created dataset and snapshots directly on the target machine. No issue arises.
  • zfs send and zfs recv of a newly-created dataset and snapshots directly on the source machine. No issue arises.

I plan to send the problematic dataset+snapshots on a new target, I'll update the issue with new details when possible.

@shodanshok
Copy link
Contributor Author

Ok, it seems the failing stream has some unexpected DRR records. I will try to make a step-by-step comparison between the failing stream and a good one. Including @behlendorf because I am not sure how to proceed forward in debugging.

Recap of the main issue: when re-receiving an already received snapshot, ZFS should simply ignore it and go ahead with other snapshots included in the stream. On contrary, the failing stream causes ZFS to abruptly abort the receive and to exit with an error code. I strongly suspect it is the same issue reported here (or a close variant): #11081

Let see the issue in action:

# good stream
[root@nas ~]# cat good.img | zfs recv -v tank/kvm 2>&1
receiving incremental stream of tank/kvm@snap3 into tank/kvm@snap3
snap tank/kvm@snap3 already exists; ignoring
# failing stream
[root@nas ~]# cat bad.img | zfs recv -v tank/kvm 2>&1
receiving incremental stream of tank/kvm@snap2 into tank/kvm@snap2
cannot receive: invalid record type: 2f505a
cannot restore to tank/kvm@snap2: destination already exists
snap tank/kvm@snap2 already exists; ignoring

Notice how receiving the bad stream exits with an invalid record type: 2f505a error which is very strange, as DRR_TYPE is an enum with only a few entries. The key issue is that a data payload is parsed as an DRR_OBJECT entry. To go deeper, I patched the function recv_read in libzfs_sendrecv.c to dump the content of the buffer it reads.

Notice how on the good stream the first "data packet" (DRR_OTHER) appears as the 8th dump (ie: recv_read invocation). I am using a "dry-run" receive in order to exclude any zfs kernel module processing (the module consumes some records, preventing them to be saw here):

[root@nas ~]# cat good.img | zfs recv -vn tank/kvm 2>&1 | less
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
00 00 00 00 00 00 00 00  AC CB BA F5 02 00 00 00  |  ................
02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  74 61 6E 6B 2F 6B 76 6D  |  ........tank/kvm
40 73 6E 61 70 33 00 00  00 00 00 00 00 00 00 00  |  @snap3..........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
RECV_READ DRR_TYPE: 5 (DRR_END)
05 00 00 00 00 00 00 00  03 3F 0E 30 02 00 00 00  |  .........?.0....
FB 60 FD 57 96 00 00 00  0A 24 81 CF A6 14 00 00  |  .`.W.....$......
34 DB 15 69 29 EF 01 00  00 00 00 00 00 00 00 00  |  4..i)...........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
00 00 00 00 00 00 00 00  AC CB BA F5 02 00 00 00  |  ................
11 00 00 00 00 00 00 00  61 F7 2D 61 00 00 00 00  |  ........a.-a....
02 00 00 00 0C 00 00 00  B0 99 A2 76 8B 90 AF 24  |  ...........v...$
CA 3B 84 59 49 26 37 23  74 61 6E 6B 2F 6B 76 6D  |  .;.YI&7#tank/kvm
40 73 6E 61 70 33 00 00  00 00 00 00 00 00 00 00  |  @snap3..........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
would receive incremental stream of tank/kvm@snap3 into tank/kvm@snap3
ZFS_RECEIVE_ONE DRYRUN
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
RECV_READ DRR_TYPE: 1 (DRR_OBJECT)
01 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |  ................    <----- this would be consumed by zfs.ko
15 00 00 00 00 00 00 00  00 02 00 00 00 00 00 00  |  ................           in a non-dry run to setup the receive
00 00 01 00 00 00 00 00  B0 99 A2 76 8B 90 AF 24  |  ...........v...$            
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  21 EF 9C 44 04 00 00 00  |  ........!..D....
F6 E5 22 5C 1F 02 00 00  4F CB 5E E7 AA 8E 00 00  |  .."\....O.^.....
9C EE EA 34 1C BB 19 00                           |  ...4....
RECV_READ DRR_TYPE: 1 (DRR_OBJECT)                                      <----- other metadata, they *need* to be here
RECV_READ DRR_TYPE: 4 (DRR_FREE)
04 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |  ................
00 02 00 00 00 00 00 00  FF FF FF FF FF FF FF FF  |  ................
B0 99 A2 76 8B 90 AF 24  00 00 00 00 00 00 00 00  |  ...v...$........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  4A F6 12 9D 08 00 00 00  |  ........J.......
F3 25 1E CA 92 04 00 00  BD 34 DE EC 71 8F 01 00  |  .%.......4..q...
09 32 8C B6 FF C4 67 00                           |  .2....g.
RECV_READ DRR_TYPE: 4 (DRR_FREE)
04 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |  ................
00 02 00 00 00 00 00 00  00 04 00 00 00 00 00 00  |  ................
B0 99 A2 76 8B 90 AF 24  00 00 00 00 00 00 00 00  |  ...v...$........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  97 02 6A 43 0C 00 00 00  |  ..........jC....
94 E2 BB E8 3B 08 00 00  70 C3 2C 35 34 82 03 00  |  ....;...p.,54...
F0 AB B2 6E FF 14 27 01                           |  ...n..'.
RECV_READ DRR_TYPE: 1 (DRR_OBJECT)
01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |  ................
14 00 00 00 2C 00 00 00  00 02 00 00 08 01 00 00  |  ....,...........
00 00 01 00 00 00 00 00  B0 99 A2 76 8B 90 AF 24  |  ...........v...$
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  22 24 ED AF 0E 00 00 00  |  ........"$......
3D 1C F8 2B A5 0C 00 00  6F 91 83 0D A6 B0 06 00  |  =..+....o.......
90 FA D6 78 9E 9C AD 02                           |  ...x....
RECV_READ DRR_TYPE: 3100762 (DRR_OTHER)
5A 50 2F 00 03 08 18 00  58 00 00 00 00 00 00 00  |  ZP/.....X.......    <----- data begin here
ED 41 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |  .A..............
35 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  5...............
00 00 00 00 00 00 00 00  22 00 00 00 00 00 00 00  |  ........".......
44 01 00 00 08 04 00 00  84 ED 2D 61 00 00 00 00  |  D.........-a....
85 FC A0 0A 00 00 00 00  3E A4 32 5A 00 00 00 00  |  ........>.2Z....
EF A0 08 37 00 00 00 00  15 E4 38 5A 00 00 00 00  |  ...7......8Z....
DC 43 58 39 00 00 00 00  E6 E1 38 5A 00 00 00 00  |  .CX9......8Z....
9B 7B 20 0D 00 00 00 00  03 00 00 00 00 00 00 00  |  .{ .............
03 00 00 00 00 00 00 00  00 00 00 10 BF 01 1E 00  |  ................
00 00 40 20 A9 00 12 00  00 00 00 40 A9 00 12 00  |  ..@ .......@....
01 01 00 00 00 00 00 00  00 00 00 01 00 00 00 44  |  ...............D
00 00 00 48 00 00 00 10  73 65 63 75 72 69 74 79  |  ...H....security
2E 73 65 6C 69 6E 75 78  00 00 00 0A 00 00 00 1F  |  .selinux........
73 79 73 74 65 6D 5F 75  3A 6F 62 6A 65 63 74 5F  |  system_u:object_
72 3A 64 65 66 61 75 6C  74 5F 74 3A 73 30 00 00  |  r:default_t:s0..
00 00 00 00 00 00 00 00

The bad stream shows data much earlier, at 5th dump/invocation of recv_read:

[root@nas ~]# cat bad.img | zfs recv -vn tank/kvm 2>&1 | less
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
00 00 00 00 00 00 00 00  AC CB BA F5 02 00 00 00  |  ................
02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  74 61 6E 6B 2F 6B 76 6D  |  ........tank/kvm
40 73 6E 61 70 32 00 00  00 00 00 00 00 00 00 00  |  @snap2..........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
RECV_READ DRR_TYPE: 5 (DRR_END)
05 00 00 00 00 00 00 00  03 3E 0E 30 02 00 00 00  |  .........>.0....
FB 23 FD 57 96 00 00 00  0A C1 79 CF A6 14 00 00  |  .#.W......y.....
34 BC 7A 68 29 EF 01 00  00 00 00 00 00 00 00 00  |  4.zh)...........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
00 00 00 00 00 00 00 00  AC CB BA F5 02 00 00 00  |  ................
11 00 00 00 00 00 00 00  6B 5F 2B 61 00 00 00 00  |  ........k_+a....
02 00 00 00 0C 00 00 00  CA 3B 84 59 49 26 37 23  |  .........;.YI&7#
33 DB B7 C9 4B C2 44 96  74 61 6E 6B 2F 6B 76 6D  |  3...K.D.tank/kvm
40 73 6E 61 70 32 00 00  00 00 00 00 00 00 00 00  |  @snap2..........
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00                           |  ........
would receive incremental stream of tank/kvm@snap2 into tank/kvm@snap2
ZFS_RECEIVE_ONE DRYRUN
RECV_READ DRR_TYPE: 0 (DRR_BEGIN)
RECV_READ DRR_TYPE: 1 (DRR_OBJECT)
01 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |  ........@.......    <----- this would be consumed by zfs.ko
14 00 00 00 2C 00 00 00  00 02 00 00 18 01 00 00  |  ....,...........           in a non-dry run to setup the receive
00 00 01 00 00 00 00 00  CA 3B 84 59 49 26 37 23  |  .........;.YI&7#
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |  ................
00 00 00 00 00 00 00 00  C8 02 AE EA 04 00 00 00  |  ................
7A 73 9E FC 7F 02 00 00  88 6F 4A C9 67 A9 00 00  |  zs.......oJ.g...
2A 3E 8A 80 61 96 1E 00                           |  *>..a...
RECV_READ DRR_TYPE: 3100762 (DRR_OTHER)                                  <----- *no metadata* here!
5A 50 2F 00 03 08 18 00  68 00 00 00 00 00 00 00  |  ZP/.....h.......    <----- data begin here
E8 41 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |  .A..............           libzfs_sendrecv.c wronly parses data
24 3A 15 01 00 00 00 00  6B 00 00 00 00 00 00 00  |  $:......k.......           as metadata, resulting in the
6B 00 00 00 00 00 00 00  06 00 00 00 00 00 00 00  |  k...............           "invalid record type" message
44 00 00 00 08 04 00 00  FF 8E 06 5E 00 00 00 00  |  D..........^....
EA 9B E5 1F 00 00 00 00  B7 E6 F2 5F 00 00 00 00  |  ..........._....
2A CA 20 07 00 00 00 00  B7 E6 F2 5F 00 00 00 00  |  *. ........_....
2A CA 20 07 00 00 00 00  29 29 AA 5D 00 00 00 00  |  *. .....)).]....
EE 09 F4 05 00 00 00 00  02 00 00 00 00 00 00 00  |  ................
03 00 00 00 00 00 00 00  00 00 00 10 BF 01 1E 00  |  ................
00 00 40 20 A9 00 12 00  00 00 00 40 88 00 12 00  |  ..@ .......@....
01 01 00 00 00 00 00 00  00 00 00 01 00 00 00 54  |  ...............T
00 00 00 58 00 00 00 10  73 65 63 75 72 69 74 79  |  ...X....security
2E 73 65 6C 69 6E 75 78  00 00 00 0A 00 00 00 2D  |  .selinux.......-
73 79 73 74 65 6D 5F 75  3A 6F 62 6A 65 63 74 5F  |  system_u:object_
72 3A 73 76 69 72 74 5F  69 6D 61 67 65 5F 74 3A  |  r:svirt_image_t:
73 30 3A 63 36 34 35 2C  63 39 31 35 00 00 00 00  |  s0:c645,c915....
00 00 00 00 00 00 00 00                           |  ........

The data packet comes too early (ie: unexpectedly) and the zfs kernel module tries to parse it as a DRR_OBJECT record, which clearly fails.

Finally, let see what zstreamdump reports for both streams:

# good stream
[root@nas ~]# zstreamdump good.img
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@snap3
        payloadlen = 0
END checksum = 2300e3f03/9657fd60fb/14a6cf81240a/1ef296915db34
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 612df761
        type = 2
        flags = 0xc
        toguid = 24af908b76a299b0
        fromguid = 2337264959843bca
        toname = tank/kvm@snap3
        payloadlen = 0
END checksum = 4fffd120bdb5/9e4f29dd4bfa46b2/49ad71d2ef327f18/cce8f7adcb830fde
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2 (0 bytes)
        Total DRR_END records = 3 (0 bytes)
        Total DRR_OBJECT records = 1024 (286528 bytes)
        Total DRR_FREEOBJECTS records = 105 (0 bytes)
        Total DRR_WRITE records = 1 (512 bytes)
        Total DRR_WRITE_BYREF records = 0 (0 bytes)
        Total DRR_WRITE_EMBEDDED records = 0 (0 bytes)
        Total DRR_FREE records = 1295 (0 bytes)
        Total DRR_SPILL records = 0 (0 bytes)
        Total records = 2430
        Total payload size = 287040 (0x46140)
        Total header overhead = 758160 (0xb9190)
        Total stream length = 1045200 (0xff2d0)
# bad stream
[root@nas ~]# zstreamdump bad.img
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@snap2
        payloadlen = 0
END checksum = 2300e3e03/9657fd23fb/14a6cf79c10a/1ef29687abc34
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 612b5f6b
        type = 2
        flags = 0xc
        toguid = 2337264959843bca
        fromguid = 9644c24bc9b7db33
        toname = tank/kvm@snap2
        payloadlen = 0
END checksum = 212577b5eeff2c0/dd107c889d28861e/7e53fd2168cf600b/b4692a8d9349f653
END checksum = 0/0/0/0
SUMMARY:
        Total DRR_BEGIN records = 2 (0 bytes)
        Total DRR_END records = 3 (0 bytes)
        Total DRR_OBJECT records = 324 (91152 bytes)
        Total DRR_FREEOBJECTS records = 33 (0 bytes)
        Total DRR_WRITE records = 5529 (724697088 bytes)
        Total DRR_WRITE_BYREF records = 0 (0 bytes)
        Total DRR_WRITE_EMBEDDED records = 0 (0 bytes)
        Total DRR_FREE records = 855 (0 bytes)
        Total DRR_SPILL records = 0 (0 bytes)
        Total records = 6746
        Total payload size = 724788240 (0x2b336410)
        Total header overhead = 2104752 (0x201db0)
        Total stream length = 726892992 (0x2b5381c0)

I can't see anything wrong with the output of zstreamdump, still the two steams receive very differently. Any guess on the root cause and/or how to debug it further?

Thanks.

@shodanshok
Copy link
Contributor Author

Executing zstreamdump -d (dumping block info) reveals something interesting...

# good stream
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@snap3
        payloadlen = 0

END checksum = 2300e3f03/9657fd60fb/14a6cf81240a/1ef296915db34
    checksum = 0/0/0/0
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 612df761
        type = 2
        flags = 0xc
        toguid = 24af908b76a299b0
        fromguid = 2337264959843bca
        toname = tank/kvm@snap3
        payloadlen = 0

OBJECT object = 1 type = 21 bonustype = 0 blksz = 512 bonuslen = 0 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
    checksum = 4449cef21/21f5c22e5f6/8eaae75ecb4f/19bb1c34eaee9c
FREE object = 1 offset = 512 length = -1
    checksum = 89d12f64a/492ca1e25f3/18f71ecde34bd/67c4ffb68c3209
FREE object = 1 offset = 512 length = 1024
    checksum = c436a0297/83be8bbe294/38234352cc370/12714ff6eb2abf0
OBJECT object = 2 type = 20 bonustype = 44 blksz = 512 bonuslen = 264 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
 5a 50 2f 00  03 08 18 00  58 00 00 00  00 00 00 00   ZP/. .... X... ....
 ed 41 00 00  00 00 00 00  03 00 00 00  00 00 00 00   .A.. .... .... ....
 35 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00   5... .... .... ....
 00 00 00 00  00 00 00 00  22 00 00 00  00 00 00 00   .... .... "... ....
 44 01 00 00  08 04 00 00  84 ed 2d 61  00 00 00 00   D... .... ..-a ....
 85 fc a0 0a  00 00 00 00  3e a4 32 5a  00 00 00 00   .... .... >.2Z ....
 ef a0 08 37  00 00 00 00  15 e4 38 5a  00 00 00 00   ...7 .... ..8Z ....
 dc 43 58 39  00 00 00 00  e6 e1 38 5a  00 00 00 00   .CX9 .... ..8Z ....
 9b 7b 20 0d  00 00 00 00  03 00 00 00  00 00 00 00   .{ . .... .... ....
 03 00 00 00  00 00 00 00  00 00 00 10  bf 01 1e 00   .... .... .... ....
 00 00 40 20  a9 00 12 00  00 00 00 40  a9 00 12 00   ..@  .... ...@ ....
 01 01 00 00  00 00 00 00  00 00 00 01  00 00 00 44   .... .... .... ...D
 00 00 00 48  00 00 00 10  73 65 63 75  72 69 74 79   ...H .... secu rity
 2e 73 65 6c  69 6e 75 78  00 00 00 0a  00 00 00 1f   .sel inux .... ....
 73 79 73 74  65 6d 5f 75  3a 6f 62 6a  65 63 74 5f   syst em_u :obj ect_
 72 3a 64 65  66 61 75 6c  74 5f 74 3a  73 30 00 00   r:de faul t_t: s0..
 00 00 00 00  00 00 00 00                             .... ....
    checksum = eafed2422/ca52bf81c3d/6b0a60d83916f/2ad9c9e78d6fa90
FREE object = 2 offset = 512 length = -1
    checksum = 1a729c6570/19097b317ac0/10ee19c2e19118/90250e0e27560da
OBJECT object = 3 type = 20 bonustype = 44 blksz = 512 bonuslen = 264 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
 5a 50 2f 00  03 08 18 00  58 00 00 00  00 00 00 00   ZP/. .... X... ....
 ed 41 00 00  00 00 00 00  03 00 00 00  00 00 00 00   .A.. .... .... ....
 35 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00   5... .... .... ....
 00 00 00 00  00 00 00 00  02 00 00 00  00 00 00 00   .... .... .... ....
 44 01 00 00  08 04 00 00  84 ed 2d 61  00 00 00 00   D... .... ..-a ....
 cf 3e b0 0a  00 00 00 00  4c a4 32 5a  00 00 00 00   .>.. .... L.2Z ....
 72 30 9c 2e  00 00 00 00  15 e4 38 5a  00 00 00 00   r0.. .... ..8Z ....
 dc 43 58 39  00 00 00 00  e6 e1 38 5a  00 00 00 00   .CX9 .... ..8Z ....
 d0 bd 2f 0d  00 00 00 00  03 00 00 00  00 00 00 00   ../. .... .... ....
 03 00 00 00  00 00 00 00  00 00 00 10  bf 01 1e 00   .... .... .... ....
 00 00 40 20  a9 00 12 00  00 00 00 40  a9 00 12 00   ..@  .... ...@ ....
 01 01 00 00  00 00 00 00  00 00 00 01  00 00 00 44   .... .... .... ...D
 00 00 00 48  00 00 00 10  73 65 63 75  72 69 74 79   ...H .... secu rity
 2e 73 65 6c  69 6e 75 78  00 00 00 0a  00 00 00 1f   .sel inux .... ....
 73 79 73 74  65 6d 5f 75  3a 6f 62 6a  65 63 74 5f   syst em_u :obj ect_
 72 3a 64 65  66 61 75 6c  74 5f 74 3a  73 30 00 00   r:de faul t_t: s0..
 00 00 00 00  00 00 00 00                             .... ....
    checksum = 1daa27bd35/21feb2b7c29d/19ee2baed8c546/f7c88dc128b5c3d
FREE object = 3 offset = 512 length = -1
    checksum = 2a2d6f2696/373b9ae6ef01/32af1ebef646ab/24845939f5062686
OBJECT object = 4 type = 20 bonustype = 44 blksz = 1024 bonuslen = 268 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
 5a 50 2f 00  03 08 18 00  5c 00 00 00  00 00 00 00   ZP/. .... \... ....
 ed 41 00 00  00 00 00 00  0c 00 00 00  00 00 00 00   .A.. .... .... ....
 35 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00   5... .... .... ....
 00 00 00 00  00 00 00 00  03 00 00 00  00 00 00 00   .... .... .... ....
 44 01 00 00  08 04 00 00  84 ed 2d 61  00 00 00 00   D... .... ..-a ....
 cf 3e b0 0a  00 00 00 00  f0 64 89 60  00 00 00 00   .>.. .... .d.` ....
 00 00 00 00  00 00 00 00  bb de d9 60  00 00 00 00   .... .... ...` ....
 96 87 f8 06  00 00 00 00  e6 e1 38 5a  00 00 00 00   .... .... ..8Z ....
 d0 bd 2f 0d  00 00 00 00  0c 00 00 00  00 00 00 00   ../. .... .... ....
 03 00 00 00  00 00 00 00  00 00 00 10  bf 01 1e 00   .... .... .... ....
 00 00 40 20  a9 00 12 00  00 00 00 40  a9 00 12 00   ..@  .... ...@ ....
 01 01 00 00  00 00 00 00  00 00 00 01  00 00 00 48   .... .... .... ...H
 00 00 00 50  00 00 00 10  73 65 63 75  72 69 74 79   ...P .... secu rity
 2e 73 65 6c  69 6e 75 78  00 00 00 0a  00 00 00 24   .sel inux .... ...$
 73 79 73 74  65 6d 5f 75  3a 6f 62 6a  65 63 74 5f   syst em_u :obj ect_
 72 3a 76 69  72 74 5f 76  61 72 5f 6c  69 62 5f 74   r:vi rt_v ar_l ib_t
 3a 73 30 00  00 00 00 00  00 00 00 00  00 00 00 00   :s0. .... .... ....
    checksum = 2d69cc18a6/44fbd6273c02/459f97cb9267c1/36c678bcaf9e7c29
...
# bad stream
[root@nas ~]# zstreamdump -d bad.img | less
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@snap2
        payloadlen = 0

END checksum = 2300e3e03/9657fd23fb/14a6cf79c10a/1ef29687abc34
    checksum = 0/0/0/0
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 612b5f6b
        type = 2
        flags = 0xc
        toguid = 2337264959843bca
        fromguid = 9644c24bc9b7db33
        toname = tank/kvm@snap2
        payloadlen = 0

OBJECT object = 64 type = 20 bonustype = 44 blksz = 512 bonuslen = 280 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
 5a 50 2f 00  03 08 18 00  68 00 00 00  00 00 00 00   ZP/. .... h... ....
 e8 41 00 00  00 00 00 00  04 00 00 00  00 00 00 00   .A.. .... .... ....
 24 3a 15 01  00 00 00 00  6b 00 00 00  00 00 00 00   $:.. .... k... ....
 6b 00 00 00  00 00 00 00  06 00 00 00  00 00 00 00   k... .... .... ....
 44 00 00 00  08 04 00 00  ff 8e 06 5e  00 00 00 00   D... .... ...^ ....
 ea 9b e5 1f  00 00 00 00  b7 e6 f2 5f  00 00 00 00   .... .... ..._ ....
 2a ca 20 07  00 00 00 00  b7 e6 f2 5f  00 00 00 00   *. . .... ..._ ....
 2a ca 20 07  00 00 00 00  29 29 aa 5d  00 00 00 00   *. . .... )).] ....
 ee 09 f4 05  00 00 00 00  02 00 00 00  00 00 00 00   .... .... .... ....
 03 00 00 00  00 00 00 00  00 00 00 10  bf 01 1e 00   .... .... .... ....
 00 00 40 20  a9 00 12 00  00 00 00 40  88 00 12 00   ..@  .... ...@ ....
 01 01 00 00  00 00 00 00  00 00 00 01  00 00 00 54   .... .... .... ...T
 00 00 00 58  00 00 00 10  73 65 63 75  72 69 74 79   ...X .... secu rity
 2e 73 65 6c  69 6e 75 78  00 00 00 0a  00 00 00 2d   .sel inux .... ...-
 73 79 73 74  65 6d 5f 75  3a 6f 62 6a  65 63 74 5f   syst em_u :obj ect_
 72 3a 73 76  69 72 74 5f  69 6d 61 67  65 5f 74 3a   r:sv irt_ imag e_t:
 73 30 3a 63  36 34 35 2c  63 39 31 35  00 00 00 00   s0:c 645, c915 ....
 00 00 00 00  00 00 00 00                             .... ....
    checksum = 4eaae02c8/27ffc9e737a/a967c94a6f88/1e9661808a3e2a
FREE object = 64 offset = 512 length = -1
    checksum = 1371f4ecb9/ae73f074a4b/41a9a3d5d5df0/1435164dbb72f6c
OBJECT object = 65 type = 20 bonustype = 44 blksz = 512 bonuslen = 280 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0
 5a 50 2f 00  03 08 18 00  68 00 00 00  00 00 00 00   ZP/. .... h... ....
 e8 41 00 00  00 00 00 00  02 00 00 00  00 00 00 00   .A.. .... .... ....
 24 3a 15 01  00 00 00 00  6b 00 00 00  00 00 00 00   $:.. .... k... ....
 6b 00 00 00  00 00 00 00  09 00 00 00  00 00 00 00   k... .... .... ....
 44 00 00 00  08 04 00 00  ff 8e 06 5e  00 00 00 00   D... .... ...^ ....
 59 20 04 20  00 00 00 00  29 29 aa 5d  00 00 00 00   Y .  .... )).] ....
 ee 09 f4 05  00 00 00 00  b7 e6 f2 5f  00 00 00 00   .... .... ..._ ....
 ba 45 02 07  00 00 00 00  29 29 aa 5d  00 00 00 00   .E.. .... )).] ....
 ee 09 f4 05  00 00 00 00  02 00 00 00  00 00 00 00   .... .... .... ....
 03 00 00 00  00 00 00 00  00 00 00 10  bf 01 1e 00   .... .... .... ....
 00 00 40 20  a9 00 12 00  00 00 00 40  88 00 12 00   ..@  .... ...@ ....
 01 01 00 00  00 00 00 00  00 00 00 01  00 00 00 54   .... .... .... ...T
 00 00 00 58  00 00 00 10  73 65 63 75  72 69 74 79   ...X .... secu rity
 2e 73 65 6c  69 6e 75 78  00 00 00 0a  00 00 00 2d   .sel inux .... ...-
 73 79 73 74  65 6d 5f 75  3a 6f 62 6a  65 63 74 5f   syst em_u :obj ect_
 72 3a 73 76  69 72 74 5f  69 6d 61 67  65 5f 74 3a   r:sv irt_ imag e_t:
 73 30 3a 63  36 34 35 2c  63 39 31 35  00 00 00 00   s0:c 645, c915 ....
 00 00 00 00  00 00 00 00                             .... ....
    checksum = 15ba098dbe/117662c13406/86e2ab6233694/321418c09944e22
...

Notice how the bad stream object id does not start at 1, and how its type is 20 (DMU_OT_PLAIN_FILE_CONTENTS) rather then 21 (DMU_OT_DIRECTORY_CONTENTS). Does it means the send stream is corrupted? If so, how can I proceed further? Please note that I can replicate the same issue on another machine pair (sending/receiving) both running ZFS 2.0.5 and used for the same purpose (running virtual machines and sending the VM disk files to another stand-by host), but I have no reproducer at hand.

@shodanshok
Copy link
Contributor Author

The same machine which show the issue on many snapshots does not show the problem on another, more recent, snapshot. I achieved the "new" correct behavior by creating and removing a single file in between two snapshots. Let see how the dump file differs:

# bad stream

[root@nas ~]# zfs send -I tank/kvm@autosnap_2021-09-02_16:00:02_hourly tank/kvm@autosnap_2021-09-02_17:00:01_hourly > bad.img
[root@nas ~]# hexdump -C bad.img | less
00000000  00 00 00 00 00 00 00 00  ac cb ba f5 02 00 00 00  |................|
00000010  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  74 61 6e 6b 2f 6b 76 6d  |........tank/kvm|
00000040  40 61 75 74 6f 73 6e 61  70 5f 32 30 32 31 2d 30  |@autosnap_2021-0|
00000050  39 2d 30 32 5f 31 37 3a  30 30 3a 30 31 5f 68 6f  |9-02_17:00:01_ho|
00000060  75 72 6c 79 00 00 00 00  00 00 00 00 00 00 00 00  |urly............|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000130  00 00 00 00 00 00 00 00  05 00 00 00 00 00 00 00  |................|
00000140  12 5e 59 8a 04 00 00 00  4a 59 64 ff 1c 01 00 00  |.^Y.....JYd.....|
00000150  12 f2 62 d4 ff 23 00 00  c4 58 8a ad 74 1f 03 00  |..b..#...X..t...|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000270  00 00 00 00 00 00 00 00  ac cb ba f5 02 00 00 00  |................|
00000280  11 00 00 00 00 00 00 00  f1 e6 30 61 00 00 00 00  |..........0a....|
00000290  02 00 00 00 0c 00 00 00  4e 4f 00 89 e1 09 5e 83  |........NO....^.|
000002a0  47 25 81 97 dd 90 6d ea  74 61 6e 6b 2f 6b 76 6d  |G%....m.tank/kvm|
000002b0  40 61 75 74 6f 73 6e 61  70 5f 32 30 32 31 2d 30  |@autosnap_2021-0|
000002c0  39 2d 30 32 5f 31 37 3a  30 30 3a 30 31 5f 68 6f  |9-02_17:00:01_ho|
000002d0  75 72 6c 79 00 00 00 00  00 00 00 00 00 00 00 00  |urly............|
000002e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000003a0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000003b0  40 00 00 00 00 00 00 00  14 00 00 00 2c 00 00 00  |@...........,...|
000003c0  00 02 00 00 18 01 00 00  00 00 01 00 00 00 00 00  |................|
000003d0  4e 4f 00 89 e1 09 5e 83  00 00 00 00 00 00 00 00  |NO....^.........|
000003e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000004c0  3b b1 36 86 08 00 00 00  4d 3b d3 63 2b 04 00 00  |;.6.....M;.c+...|
000004d0  1f d7 a7 3e ce 11 01 00  9d fb 5a f5 7b 21 30 00  |...>......Z.{!0.|
000004e0  5a 50 2f 00 03 08 18 00  68 00 00 00 00 00 00 00  |ZP/.....h.......|     <---- data block here (too early)
000004f0  e8 41 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |.A..............|
00000500  24 3a 15 01 00 00 00 00  6b 00 00 00 00 00 00 00  |$:......k.......|
00000510  6b 00 00 00 00 00 00 00  06 00 00 00 00 00 00 00  |k...............|
00000520  44 00 00 00 08 04 00 00  ff 8e 06 5e 00 00 00 00  |D..........^....|
00000530  ea 9b e5 1f 00 00 00 00  b7 e6 f2 5f 00 00 00 00  |..........._....|
00000540  2a ca 20 07 00 00 00 00  b7 e6 f2 5f 00 00 00 00  |*. ........_....|
00000550  2a ca 20 07 00 00 00 00  29 29 aa 5d 00 00 00 00  |*. .....)).]....|
00000560  ee 09 f4 05 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00000570  03 00 00 00 00 00 00 00  00 00 00 10 bf 01 1e 00  |................|
00000580  00 00 40 20 a9 00 12 00  00 00 00 40 88 00 12 00  |..@ .......@....|
00000590  01 01 00 00 00 00 00 00  00 00 00 01 00 00 00 54  |...............T|
000005a0  00 00 00 58 00 00 00 10  73 65 63 75 72 69 74 79  |...X....security|
000005b0  2e 73 65 6c 69 6e 75 78  00 00 00 0a 00 00 00 2d  |.selinux.......-|
000005c0  73 79 73 74 65 6d 5f 75  3a 6f 62 6a 65 63 74 5f  |system_u:object_|
000005d0  72 3a 73 76 69 72 74 5f  69 6d 61 67 65 5f 74 3a  |r:svirt_image_t:|
000005e0  73 30 3a 63 36 34 35 2c  63 39 31 35 00 00 00 00  |s0:c645,c915....|
000005f0  00 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00000600  40 00 00 00 00 00 00 00  00 02 00 00 00 00 00 00  |@...............|
00000610  ff ff ff ff ff ff ff ff  4e 4f 00 89 e1 09 5e 83  |........NO....^.|
00000620  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

[root@nas ~]# zstreamdump -v bad.img | less
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@autosnap_2021-09-02_17:00:01_hourly
        payloadlen = 0

END checksum = 48a595e12/11cff64594a/23ffd462f212/31f74ad8a58c4
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 6130e6f1
        type = 2
        flags = 0xc
        toguid = 835e09e189004f4e
        fromguid = ea6d90dd97812547
        toname = tank/kvm@autosnap_2021-09-02_17:00:01_hourly
        payloadlen = 0

OBJECT object = 64 type = 20 bonustype = 44 blksz = 512 bonuslen = 280 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0     <---- first object id is 64
# good stream

[root@nas ~]# zfs send -I tank/kvm@autosnap_2021-09-02_17:00:01_hourly tank/kvm@autosnap_2021-09-02_18:00:01_hourly > good2.img
[root@nas ~]# hexdump -C good2.img | less
00000000  00 00 00 00 00 00 00 00  ac cb ba f5 02 00 00 00  |................|
00000010  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  74 61 6e 6b 2f 6b 76 6d  |........tank/kvm|
00000040  40 61 75 74 6f 73 6e 61  70 5f 32 30 32 31 2d 30  |@autosnap_2021-0|
00000050  39 2d 30 32 5f 31 38 3a  30 30 3a 30 31 5f 68 6f  |9-02_18:00:01_ho|
00000060  75 72 6c 79 00 00 00 00  00 00 00 00 00 00 00 00  |urly............|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000130  00 00 00 00 00 00 00 00  05 00 00 00 00 00 00 00  |................|
00000140  12 5e 5a 8a 04 00 00 00  4a 59 9d ff 1c 01 00 00  |.^Z.....JY......|
00000150  12 f2 d7 da ff 23 00 00  c4 58 87 2c 75 1f 03 00  |.....#...X.,u...|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000270  00 00 00 00 00 00 00 00  ac cb ba f5 02 00 00 00  |................|
00000280  11 00 00 00 00 00 00 00  01 f5 30 61 00 00 00 00  |..........0a....|
00000290  02 00 00 00 0c 00 00 00  42 80 de f0 cb 92 47 34  |........B.....G4|
000002a0  4e 4f 00 89 e1 09 5e 83  74 61 6e 6b 2f 6b 76 6d  |NO....^.tank/kvm|
000002b0  40 61 75 74 6f 73 6e 61  70 5f 32 30 32 31 2d 30  |@autosnap_2021-0|
000002c0  39 2d 30 32 5f 31 38 3a  30 30 3a 30 31 5f 68 6f  |9-02_18:00:01_ho|
000002d0  75 72 6c 79 00 00 00 00  00 00 00 00 00 00 00 00  |urly............|
000002e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000003a0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000003b0  01 00 00 00 00 00 00 00  15 00 00 00 00 00 00 00  |................|
000003c0  00 02 00 00 00 00 00 00  00 00 01 00 00 00 00 00  |................|
000003d0  42 80 de f0 cb 92 47 34  00 00 00 00 00 00 00 00  |B.....G4........|
000003e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000004c0  90 d4 36 42 08 00 00 00  53 7c 5e 1f 01 04 00 00  |..6B....S|^.....|
000004d0  a6 7b 21 99 dd 05 01 00  ef 5f 60 b8 bb f4 2d 00  |.{!......_`...-.|
000004e0  04 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000004f0  00 02 00 00 00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000500  42 80 de f0 cb 92 47 34  00 00 00 00 00 00 00 00  |B.....G4........|
00000510  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000005f0  00 00 00 00 00 00 00 00  b9 14 a3 1a 0d 00 00 00  |................|
00000600  b8 ef 99 bf c8 07 00 00  68 69 f3 b8 af cc 02 00  |........hi......|
00000610  ce 57 a5 af b0 32 bc 00  04 00 00 00 00 00 00 00  |.W...2..........|
00000620  01 00 00 00 00 00 00 00  00 02 00 00 00 00 00 00  |................|
00000630  00 04 00 00 00 00 00 00  42 80 de f0 cb 92 47 34  |........B.....G4|
00000640  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000730  a6 fa 5d 83 10 00 00 00  03 2d 7e b9 b5 0c 00 00  |..]......-~.....|
00000740  4d be d8 5b f5 ea 05 00  45 a0 19 e1 4f da 07 02  |M..[....E...O...|
00000750  01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00000760  14 00 00 00 2c 00 00 00  00 02 00 00 08 01 00 00  |....,...........|
00000770  00 00 01 00 00 00 00 00  42 80 de f0 cb 92 47 34  |........B.....G4|
00000780  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000860  00 00 00 00 00 00 00 00  42 69 61 24 14 00 00 00  |........Bia$....|
00000870  b1 5c d9 c6 bb 12 00 00  66 70 9b 4c a8 b4 0a 00  |.\......fp.L....|
00000880  3f 41 85 ed b3 82 86 04  5a 50 2f 00 03 08 18 00  |?A......ZP/.....|     <---- data begin here (later)
00000890  58 00 00 00 00 00 00 00  ed 41 00 00 00 00 00 00  |X........A......|
000008a0  03 00 00 00 00 00 00 00  35 00 00 00 00 00 00 00  |........5.......|

[root@nas ~]# zstreamdump -v good2.img | less
BEGIN record
        hdrtype = 2
        features = 0
        magic = 2f5bacbac
        creation_time = 0
        type = 0
        flags = 0x0
        toguid = 0
        fromguid = 0
        toname = tank/kvm@autosnap_2021-09-02_18:00:01_hourly
        payloadlen = 0

END checksum = 48a5a5e12/11cff9d594a/23ffdad7f212/31f752c8758c4
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 6130f501
        type = 2
        flags = 0xc
        toguid = 344792cbf0de8042
        fromguid = 835e09e189004f4e
        toname = tank/kvm@autosnap_2021-09-02_18:00:01_hourly
        payloadlen = 0

OBJECT object = 1 type = 21 bonustype = 0 blksz = 512 bonuslen = 0 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 0 nlevels = 0 nblkptr = 0     <---- first object id is 1

Notice how the zstreamdump having object id = 1 show the first data record much later than the one (failing) having an higher object id (64 in this case) as first entry. Is this difference an expected behavior? If so, why zfs recv fails rather than simply skipping the duplicate snapshot? I never saw this behavior on zfs send done from a 0.8.x machine, so I suppose this is a regression from the recent send/recv changes included in the 2.0.x branch. Including @ahrens who authored most of the commits regarding the new send/recv behavior.

@tandersn
Copy link

Is there any news here? Is there a workaround?

@shodanshok
Copy link
Contributor Author

@tandersn I worked around the issue by using a newer syncoid version which does not erroneously re-send the same snapshot. However, I think the key issue (inside the zfs kernel modules) remains.

@tandersn
Copy link

Thanks. This issue is actually harder for me to work around. I've developed my own snapshot mechanism that is based on renaming the oldest snapshot of one type, to the newest of the next type. It takes daily snapshots normally. then at the weekly snapshot point, it renames the oldest daily to a weekly, and at the bi-monthly snapshot point, renames the oldest weekly to the bi-monthly, and at the monthly point, renames the oldest bi-monthly to monthly. To get the replicant sync'd to the primary, i must include the range of all possibly renamed snapshots. It works fine on some of my servers (the ones that don't have this error), but this error is preventing sync on about half my servers. My point is to illustrate (to whomever) that there is a legitimate case here, for use -I and a range of snapshots that includes snapshots that already exist on the target (on the systems this works on, the snapshots that have been renamed on the primary, get updated / renamed on the target).

@montanaviking
Copy link

I have the same problem as described above.
I am using zfs version:
2.1.4-0york0~18.04
on Ubuntu 18.04 (64-bit machine)
I am sending autosnapshots to USB drives (RAIDZ and mirror backups) using the command as a form e.g.:
sudo zfs send -I datatank/carbonics@zfs-auto-snap_daily-2022-04-25-0609 datatank/carbonics@zfs-auto-snap_frequent-2022-05-05-1501 | sudo zfs receive -vduF zbackupc1

where zbackupc1 is the backup pool (RAIDZ with three USB drives)
The result is sometimes:

receiving incremental stream of datatank/carbonics@zfs-auto-snap_weekly-2022-04-25-0611 into zbackupc1/carbonics@zfs-auto-snap_weekly-2022-04-25-0611
snap zbackupc1/carbonics@zfs-auto-snap_weekly-2022-04-25-0611 already exists; ignoring
cannot receive: invalid record type
cannot restore to zbackupc1/carbonics@zfs-auto-snap_weekly-2022-04-25-0611: destination already exists

My workaround is to delete all snapshots on the destination pool which have dates including (or earlier) and later than that of the snapshot which gave the "invalid record type" error.

I'd like to automate backups but this has significantly complicated that as I will have to detect the problem and implement the workaround in the auto backup application.

Thanks and best,
Phil

@stale
Copy link

stale bot commented May 8, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label May 8, 2023
@stale stale bot closed this as completed Aug 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants