-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with incremental replication on ZFS 2.0.5 - cannot receive: invalid record type #12476
Comments
I partially tracked down the issue to two snapshots having the same However, sending for a second time the same snapshot should cause no issue, as |
After having patched
It is my understanding that Any clue? |
I continued investigating the issue. I tried to:
I plan to send the problematic dataset+snapshots on a new target, I'll update the issue with new details when possible. |
Ok, it seems the failing stream has some unexpected DRR records. I will try to make a step-by-step comparison between the failing stream and a good one. Including @behlendorf because I am not sure how to proceed forward in debugging. Recap of the main issue: when re-receiving an already received snapshot, ZFS should simply ignore it and go ahead with other snapshots included in the stream. On contrary, the failing stream causes ZFS to abruptly abort the receive and to exit with an error code. I strongly suspect it is the same issue reported here (or a close variant): #11081 Let see the issue in action:
Notice how receiving the bad stream exits with an Notice how on the good stream the first "data packet" (DRR_OTHER) appears as the 8th dump (ie:
The bad stream shows data much earlier, at 5th dump/invocation of
The data packet comes too early (ie: unexpectedly) and the zfs kernel module tries to parse it as a DRR_OBJECT record, which clearly fails. Finally, let see what zstreamdump reports for both streams:
I can't see anything wrong with the output of Thanks. |
Executing
Notice how the bad stream object id does not start at 1, and how its type is 20 ( |
The same machine which show the issue on many snapshots does not show the problem on another, more recent, snapshot. I achieved the "new" correct behavior by creating and removing a single file in between two snapshots. Let see how the dump file differs:
Notice how the |
Is there any news here? Is there a workaround? |
@tandersn I worked around the issue by using a newer |
Thanks. This issue is actually harder for me to work around. I've developed my own snapshot mechanism that is based on renaming the oldest snapshot of one type, to the newest of the next type. It takes daily snapshots normally. then at the weekly snapshot point, it renames the oldest daily to a weekly, and at the bi-monthly snapshot point, renames the oldest weekly to the bi-monthly, and at the monthly point, renames the oldest bi-monthly to monthly. To get the replicant sync'd to the primary, i must include the range of all possibly renamed snapshots. It works fine on some of my servers (the ones that don't have this error), but this error is preventing sync on about half my servers. My point is to illustrate (to whomever) that there is a legitimate case here, for use -I and a range of snapshots that includes snapshots that already exist on the target (on the systems this works on, the snapshots that have been renamed on the primary, get updated / renamed on the target). |
I have the same problem as described above. where zbackupc1 is the backup pool (RAIDZ with three USB drives) receiving incremental stream of datatank/carbonics@zfs-auto-snap_weekly-2022-04-25-0611 into zbackupc1/carbonics@zfs-auto-snap_weekly-2022-04-25-0611 My workaround is to delete all snapshots on the destination pool which have dates including (or earlier) and later than that of the snapshot which gave the "invalid record type" error. I'd like to automate backups but this has significantly complicated that as I will have to detect the problem and implement the workaround in the auto backup application. Thanks and best, |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
After upgrading source & destination machine to ZFS 2.0.5, sometime I can not receive the last incremental snapshot due to the following error:
The
cannot receive: invalid record type
seems strange: from here one can see that this error should be returned only when a unknow record type is processed, but withzstreamdump
I can't see anything strange:Try to forcing with
-F
does not produce better results:I also tried restoring an incremental stream generated with
-i
(ie: not using a replication stream). Without forcing anything it also fails, but no reference to an invalid record exists:Forcing the very same
-i
stream does works indeed:It seems as if the destination dataset was modified; however, it is in readonly mode and no files were deleted (so ZFS delete queue should be empty). Moreover, I never encountered the
invalid record type
issue before, and I don't know why a forced receive (-F
) does work with an incremental send (-i
) but not for an incremental+replication stream (-I
).Describe how to reproduce the problem
No specific reproducer; replication fails each 7-10 days.
Include any warning/errors/backtraces from the system logs
None relevant in system logs.
The text was updated successfully, but these errors were encountered: