-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed ARC, Compressed Send/Recv, ARC refactoring #5078
Compressed ARC, Compressed Send/Recv, ARC refactoring #5078
Conversation
@@ -104,12 +104,15 @@ export COMPRESSION_PROP=on | |||
export CHECKSUM_PROP=on | |||
|
|||
# some common variables used by test scripts : | |||
export FIO_SCRIPTS=$STF_SUITE/tests/perf/fio | |||
export export PERF_SCRIPTS=$STF_SUITE/tests/perf/scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one export
Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS Issue: https://www.illumos.org/issues/6950
@@ -96,20 +96,21 @@ typedef enum drr_headertype { | |||
#define DMU_BACKUP_FEATURE_SA_SPILL (1 << 2) | |||
/* flags #3 - #15 are reserved for incompatible closed-source implementations */ | |||
#define DMU_BACKUP_FEATURE_EMBED_DATA (1 << 16) | |||
#define DMU_BACKUP_FEATURE_EMBED_DATA_LZ4 (1 << 17) | |||
#define DMU_BACKUP_FEATURE_LZ4 (1 << 17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just happened to look at this in another browser and it seems chrome jiust doesn't render this correctly for whatever reason. Nevermind.
Other than the few things I commented on inline, my only other comments are:
Other than those couple of things LGTM. |
Sure, I'd be ok using flags for those parameters instead of |
Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>
Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>
Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>
Thank you everyone for working on this. I think we're close to getting this merged after @dpquigl's latest refresh, I think the only outstanding concern is the ZIO_FLAG_RAW fix mentioned above. @dankimmel @tcaputi is there an existing commit with this fix we should add to this stack? Are there any other remaining concerns? |
@behlendorf It looks like Github won't let me attach diffs to the issue for some reason. I sent you the fix that I sent to Dan Other than that LGTM. |
"Enable raw writes to perform dedup with verification" as a title for that commit? |
I'll update the comment message in the merge. |
@behlendorf most (if not all) buildbots seem to indicate green light, nice 👍 What's the consensus on this one ? Going to get merged soon into master ? If that's the case it should be ready for wider usage/testing, right ? ("bleeding edge") |
Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS-issue: https://www.illumos.org/issues/6950 OpenZFS-commit: openzfs/openzfs@7fc10f0 Issue openzfs#5078
Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078
Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: David Quigley <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue openzfs#5078
…pa.h Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078
Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078
@dpquigl @tcaputi @dankimmel @ahrens @grwilson thanks everybody! @kernelOfTruth it's going to get merged in to master now. It would be great if you could put this through its paces with some real world workloads and stress tests. I've love to see some more performance data from this change from a wider variety of hardware. Last week I tagged a 0.7.0-rc1 release which would be ideal to use as a reference point for comparison. @kpande the plan is to tag an 0.7.0-rc2 release with this feature once we've got a little more run time on it and it has proven solid. Merged as: 524b421 DLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone() |
This Pull request encompasses multiple efforts in an attempt to get ABD support merged into ZFS On Linux.
For the test suite two new patches are introduced:
The first patch introduces the ability to use real disks to back the devices used by the test suite.
The second patch introduces the performance regression tests which were part of the original compressed arc PR from George Wilson.
For the remainder of the patch set the patches introduce compressed arc, compressed send/recv, and do some arc refactoring. These patches are stable and have been extensively tested and are the base to build the remainder of the ABD patchset work on.