Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

Closed
wants to merge 5 commits into from
Closed

Compressed ARC, Compressed Send/Recv, ARC refactoring #5078

wants to merge 5 commits into from

Conversation

dpquigl
Copy link
Contributor

@dpquigl dpquigl commented Sep 8, 2016

This Pull request encompasses multiple efforts in an attempt to get ABD support merged into ZFS On Linux.

For the test suite two new patches are introduced:

The first patch introduces the ability to use real disks to back the devices used by the test suite.
The second patch introduces the performance regression tests which were part of the original compressed arc PR from George Wilson.

For the remainder of the patch set the patches introduce compressed arc, compressed send/recv, and do some arc refactoring. These patches are stable and have been extensively tested and are the base to build the remainder of the ABD patchset work on.

@@ -104,12 +104,15 @@ export COMPRESSION_PROP=on
export CHECKSUM_PROP=on

# some common variables used by test scripts :
export FIO_SCRIPTS=$STF_SUITE/tests/perf/fio
export export PERF_SCRIPTS=$STF_SUITE/tests/perf/scripts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one export

Authored by: George Wilson <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Ported by: David Quigley <[email protected]>

This review covers the reading and writing of compressed arc headers, sharing
data between the arc_hdr_t and the arc_buf_t, and the implementation of a new
dbuf cache to keep frequently access data uncompressed.

I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs
off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block
for that DVA. The physical block may or may not be compressed. If compressed
arc is enabled and the block on-disk is compressed, then the b_pdata will match
the block on-disk and remain compressed in memory. If the block on disk is not
compressed, then neither will the b_pdata. Lastly, if compressed arc is
disabled, then b_pdata will always be an uncompressed version of the on-disk
block.

Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict
any arc_buf_t's that are no longer referenced. This means that the arc will
primarily have compressed blocks as the arc_buf_t's are considered overhead and
are always uncompressed. When a consumer reads a block we first look to see if
the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new
arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If
the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t
and bcopy the uncompressed contents from the first arc_buf_t to the new one.

Writing to the compressed arc requires that we first discard the b_pdata since
the physical block is about to be rewritten. The new data contents will be
passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we
will copy the physical block contents to a newly allocated b_pdata.

When an l2arc is inuse it will also take advantage of the b_pdata. Now the
l2arc will always write the contents of b_pdata to the l2arc. This means that
when compressed arc is enabled that the l2arc blocks are identical to those
stored in the main data pool. This provides a significant advantage since we
can leverage the bp's checksum when reading from the l2arc to determine if the
contents are valid. If the compressed arc is disabled, then we must first
transform the read block to look like the physical block in the main data pool
before comparing the checksum and determining it's valid.

OpenZFS Issue: https://www.illumos.org/issues/6950
@behlendorf behlendorf changed the title Compressed ARC, Compressed Send/Recv, ARC refactoring and Test Suite performance testing and real disk usage Compressed ARC, Compressed Send/Recv, ARC refactoring Sep 8, 2016
@behlendorf behlendorf added the Type: Feature Feature request or new feature label Sep 9, 2016
@behlendorf behlendorf added this to the 0.7.0 milestone Sep 9, 2016
@@ -96,20 +96,21 @@ typedef enum drr_headertype {
#define DMU_BACKUP_FEATURE_SA_SPILL (1 << 2)
/* flags #3 - #15 are reserved for incompatible closed-source implementations */
#define DMU_BACKUP_FEATURE_EMBED_DATA (1 << 16)
#define DMU_BACKUP_FEATURE_EMBED_DATA_LZ4 (1 << 17)
#define DMU_BACKUP_FEATURE_LZ4 (1 << 17)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just happened to look at this in another browser and it seems chrome jiust doesn't render this correctly for whatever reason. Nevermind.

@tcaputi
Copy link
Contributor

tcaputi commented Sep 9, 2016

Other than the few things I commented on inline, my only other comments are:

  • I like that we use 7 bits for compression, it leaves 1 for encryption (not biased here at all)
  • Should we consider compressing all of the compress_ok, large_block_ok, embed_ok stuff into a flags field? I'm going to have to add encrypt_ok here soon anyway...

Other than those couple of things LGTM.

@dankimmel
Copy link
Contributor

Sure, I'd be ok using flags for those parameters instead of xxx_ok.

@behlendorf
Copy link
Contributor

Thank you everyone for working on this.

I think we're close to getting this merged after @dpquigl's latest refresh, I think the only outstanding concern is the ZIO_FLAG_RAW fix mentioned above. @dankimmel @tcaputi is there an existing commit with this fix we should add to this stack?

Are there any other remaining concerns?

@tcaputi
Copy link
Contributor

tcaputi commented Sep 12, 2016

@behlendorf It looks like Github won't let me attach diffs to the issue for some reason. I sent you the fix that I sent to Dan

Other than that LGTM.

@tcaputi
Copy link
Contributor

tcaputi commented Sep 13, 2016

"Enable raw writes to perform dedup with verification" as a title for that commit?

@behlendorf
Copy link
Contributor

I'll update the comment message in the merge.

@kernelOfTruth
Copy link
Contributor

@behlendorf most (if not all) buildbots seem to indicate green light, nice 👍

What's the consensus on this one ? Going to get merged soon into master ?

If that's the case it should be ready for wider usage/testing, right ? ("bleeding edge")

behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 13, 2016
Authored by: George Wilson <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>

This review covers the reading and writing of compressed arc headers, sharing
data between the arc_hdr_t and the arc_buf_t, and the implementation of a new
dbuf cache to keep frequently access data uncompressed.

I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs
off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block
for that DVA. The physical block may or may not be compressed. If compressed
arc is enabled and the block on-disk is compressed, then the b_pdata will match
the block on-disk and remain compressed in memory. If the block on disk is not
compressed, then neither will the b_pdata. Lastly, if compressed arc is
disabled, then b_pdata will always be an uncompressed version of the on-disk
block.

Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict
any arc_buf_t's that are no longer referenced. This means that the arc will
primarily have compressed blocks as the arc_buf_t's are considered overhead and
are always uncompressed. When a consumer reads a block we first look to see if
the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new
arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If
the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t
and bcopy the uncompressed contents from the first arc_buf_t to the new one.

Writing to the compressed arc requires that we first discard the b_pdata since
the physical block is about to be rewritten. The new data contents will be
passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we
will copy the physical block contents to a newly allocated b_pdata.

When an l2arc is inuse it will also take advantage of the b_pdata. Now the
l2arc will always write the contents of b_pdata to the l2arc. This means that
when compressed arc is enabled that the l2arc blocks are identical to those
stored in the main data pool. This provides a significant advantage since we
can leverage the bp's checksum when reading from the l2arc to determine if the
contents are valid. If the compressed arc is disabled, then we must first
transform the read block to look like the physical block in the main data pool
before comparing the checksum and determining it's valid.

OpenZFS-issue: https://www.illumos.org/issues/6950
OpenZFS-commit: openzfs/openzfs@7fc10f0
Issue openzfs#5078
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 13, 2016
Authored by: Dan Kimmel <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
Issue openzfs#5078
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 13, 2016
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed by: David Quigley <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Issue openzfs#5078
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 13, 2016
…pa.h

Authored by: Dan Kimmel <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
Issue openzfs#5078
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 13, 2016
Authored by: Dan Kimmel <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
Issue openzfs#5078
@behlendorf
Copy link
Contributor

@dpquigl @tcaputi @dankimmel @ahrens @grwilson thanks everybody!

@kernelOfTruth it's going to get merged in to master now.

It would be great if you could put this through its paces with some real world workloads and stress tests. I've love to see some more performance data from this change from a wider variety of hardware.

Last week I tagged a 0.7.0-rc1 release which would be ideal to use as a reference point for comparison. @kpande the plan is to tag an 0.7.0-rc2 release with this feature once we've got a little more run time on it and it has proven solid.

Merged as:

524b421 DLPX-44733 combine arc_buf_alloc_impl() with arc_buf_clone()
c443487 Remove lint suppression from dmu.h and unnecessary dmu.h include in spa.
c17bcf8 Enable raw writes to perform dedup with verification
2aa3438 DLPX-40252 integrate EP-476 compressed zfs send/receive
d3c2ae1 OpenZFS 6950 - ARC should cache compressed data

@behlendorf behlendorf closed this Sep 13, 2016
@dpquigl dpquigl deleted the openzfs-compressedarc-patchset-0.7.0 branch September 13, 2016 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants