-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEEK_DATA fails upon a file just opened after being rewritten by mmap #11697
Comments
It seems like if we don't know if there is data present, we should err on the side of treating it as DATA, not HOLE.
That's a bit surprising, I wonder how that fixes it. FYI, you can set the module parameter |
That's a good start, but may I suggest something a bit more intelligent than the current behaviour? If you think about it, nobody really cares about sparse copies of small files, so for files sized below some threshold (e.g. 1Mb), if you don't know, return DATA. If, however, the file is sized larger than that threshold, then forcing txg sync to find holes (i.e. 95% of the time, large files will contain no holes. Wasting time looking for holes on those makes no sense. If you had a spare bit somewhere in metadata which you could update to 1 or 0 based on whether there is at least one hole in the file, you could avoid forcing txg sync to find holes except where you really have to. However, certainly as an immediate step, returning DATA instead of HOLE for when we don't know is an urgent fix. The open source file system abstraction layer we use its file and directory clone routines always do optimised sparse copies, so right now, it's completely broken on ZFS for file and directory clones where a directory tree is undergoing rapid mutation. I can think of a few other things which would also be broken e.g. rsync, which we use to take backups of a live DB currently in use, and which is also sparse file aware. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
This issue should be fixed by commit de198f2, the fix was included in the 2.0.7 and 2.1.2 releases. |
System information
Describe the problem you're observing
We tested a custom database running on Ubuntu 18.04 to 20.04 and found a most odd regression in ZFS. I have repeated only the pertinent parts from strace here for brevity:
Opens metadata file for modification, maps it into memory, takes exclusive lock. Reads existing metadata from the map.
Resizes metadata file to new metadata size.
Remaps the map to accommodate the larger size, and writes the updated metadata into the map.
Releases the exclusive lock, unmaps the file, closes the file.
Opens the metadata file just recently updated, however
SEEK_DATA
says it contains no valid extents (ENXIO
). This causes a sparse file content clone routine to replicate the file with all bits zero, because that is correct if the source file has no valid extents. That, in turn, causes a chain of failure way down the line. It took some hours to discover the originating cause.Describe how to reproduce the problem
I can't describe how to reproduce the problem as this is a proprietary database under load. I can tell you this however:
SEEK_DATA
returns the truth.SEEK_DATA
with a single byteread()
of that file, suddenly it works.We've got with workaround 2 for production, and it seems to work well.
Include any warning/errors/backtraces from the system logs
N/A
The text was updated successfully, but these errors were encountered: