Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

dpquigl · 2016-09-08T18:00:40Z

This Pull request encompasses multiple efforts in an attempt to get ABD support merged into ZFS On Linux.

For the test suite two new patches are introduced:

The first patch introduces the ability to use real disks to back the devices used by the test suite.
The second patch introduces the performance regression tests which were part of the original compressed arc PR from George Wilson.

For the remainder of the patch set the patches introduce compressed arc, compressed send/recv, and do some arc refactoring. These patches are stable and have been extensively tested and are the base to build the remainder of the ABD patchset work on.

behlendorf · 2016-09-08T21:45:46Z

tests/zfs-tests/include/default.cfg.in

@@ -104,12 +104,15 @@ export COMPRESSION_PROP=on
 export CHECKSUM_PROP=on

 # some common variables used by test scripts :
+export FIO_SCRIPTS=$STF_SUITE/tests/perf/fio
+export export PERF_SCRIPTS=$STF_SUITE/tests/perf/scripts


Just one export

Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS Issue: https://www.illumos.org/issues/6950

tcaputi · 2016-09-09T18:19:50Z

include/sys/zfs_ioctl.h

@@ -96,20 +96,21 @@ typedef enum drr_headertype {
 #define	DMU_BACKUP_FEATURE_SA_SPILL		(1 << 2)
 /* flags #3 - #15 are reserved for incompatible closed-source implementations */
 #define	DMU_BACKUP_FEATURE_EMBED_DATA		(1 << 16)
-#define	DMU_BACKUP_FEATURE_EMBED_DATA_LZ4	(1 << 17)
+#define	DMU_BACKUP_FEATURE_LZ4			(1 << 17)


I just happened to look at this in another browser and it seems chrome jiust doesn't render this correctly for whatever reason. Nevermind.

tcaputi · 2016-09-09T19:28:20Z

Other than the few things I commented on inline, my only other comments are:

I like that we use 7 bits for compression, it leaves 1 for encryption (not biased here at all)
Should we consider compressing all of the compress_ok, large_block_ok, embed_ok stuff into a flags field? I'm going to have to add encrypt_ok here soon anyway...

Other than those couple of things LGTM.

dankimmel · 2016-09-09T22:24:14Z

Sure, I'd be ok using flags for those parameters instead of xxx_ok.

Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>

behlendorf · 2016-09-12T23:11:20Z

Thank you everyone for working on this.

I think we're close to getting this merged after @dpquigl's latest refresh, I think the only outstanding concern is the ZIO_FLAG_RAW fix mentioned above. @dankimmel @tcaputi is there an existing commit with this fix we should add to this stack?

Are there any other remaining concerns?

tcaputi · 2016-09-12T23:35:08Z

@behlendorf It looks like Github won't let me attach diffs to the issue for some reason. I sent you the fix that I sent to Dan

Other than that LGTM.

tcaputi · 2016-09-13T02:08:22Z

"Enable raw writes to perform dedup with verification" as a title for that commit?

behlendorf · 2016-09-13T15:57:46Z

I'll update the comment message in the merge.

kernelOfTruth · 2016-09-13T16:29:42Z

@behlendorf most (if not all) buildbots seem to indicate green light, nice 👍

What's the consensus on this one ? Going to get merged soon into master ?

If that's the case it should be ready for wider usage/testing, right ? ("bleeding edge")

Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS-issue: https://www.illumos.org/issues/6950 OpenZFS-commit: openzfs/openzfs@7fc10f0 Issue openzfs#5078

Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078

Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Reviewed by: David Quigley <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Issue openzfs#5078

…pa.h Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078

Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue openzfs#5078

behlendorf · 2016-09-13T17:04:59Z

@dpquigl @tcaputi @dankimmel @ahrens @grwilson thanks everybody!

@kernelOfTruth it's going to get merged in to master now.

It would be great if you could put this through its paces with some real world workloads and stress tests. I've love to see some more performance data from this change from a wider variety of hardware.

Last week I tagged a 0.7.0-rc1 release which would be ideal to use as a reference point for comparison. @kpande the plan is to tag an 0.7.0-rc2 release with this feature once we've got a little more run time on it and it has proven solid.

Merged as:

524b421 DLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone()
c443487 Remove lint suppression from dmu.h and unnecessary dmu.h include in spa.
c17bcf8 Enable raw writes to perform dedup with verification
2aa3438 DLPX-40252 integrate EP-476 compressed zfs send/receive
d3c2ae1 OpenZFS 6950 - ARC should cache compressed data

behlendorf reviewed Sep 8, 2016
View reviewed changes

behlendorf changed the title ~~Compressed ARC, Compressed Send/Recv, ARC refactoring and Test Suite performance testing and real disk usage~~ Compressed ARC, Compressed Send/Recv, ARC refactoring Sep 8, 2016

behlendorf added the Type: Feature Feature request or new feature label Sep 9, 2016

behlendorf added this to the 0.7.0 milestone Sep 9, 2016

tcaputi reviewed Sep 9, 2016
View reviewed changes

dankimmel added 3 commits September 12, 2016 16:01

DLPX-40252 integrate EP-476 compressed zfs send/receive

d24a3a9

Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>

Remove lint supression from dmu.h and unnecessary dmu.h include in spa.h

1ff2906

Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>

DLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone()

b051884

Authored by: Dan Kimmel <[email protected]> Ported by: David Quigley <[email protected]>

Need a proper patch title for raw_dd_verify

9b8d7b7

behlendorf closed this Sep 13, 2016

dpquigl deleted the openzfs-compressedarc-patchset-0.7.0 branch September 13, 2016 20:34

kernelOfTruth mentioned this pull request Sep 14, 2016

zfs seems to use more memory than it should #5035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

dpquigl commented Sep 8, 2016

behlendorf Sep 8, 2016

tcaputi Sep 9, 2016

tcaputi Sep 12, 2016

tcaputi commented Sep 9, 2016 •

edited

Loading

dankimmel commented Sep 9, 2016

behlendorf commented Sep 12, 2016

tcaputi commented Sep 12, 2016

tcaputi commented Sep 13, 2016

behlendorf commented Sep 13, 2016

kernelOfTruth commented Sep 13, 2016

behlendorf commented Sep 13, 2016

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

Conversation

dpquigl commented Sep 8, 2016

behlendorf Sep 8, 2016

Choose a reason for hiding this comment

tcaputi Sep 9, 2016

Choose a reason for hiding this comment

tcaputi Sep 12, 2016

Choose a reason for hiding this comment

tcaputi commented Sep 9, 2016 • edited Loading

dankimmel commented Sep 9, 2016

behlendorf commented Sep 12, 2016

tcaputi commented Sep 12, 2016

tcaputi commented Sep 13, 2016

behlendorf commented Sep 13, 2016

kernelOfTruth commented Sep 13, 2016

behlendorf commented Sep 13, 2016

tcaputi commented Sep 9, 2016 •

edited

Loading