Skip to content

Add set_pledged_input_size to ZstdCompressor #134938

Closed
@emmatyping

Description

@emmatyping

Feature or enhancement

Proposal:

pyzstd's ZstdCompressor class had a method _set_pledged_input_size, which allowed users to set the amount of data they were going to write into a frame so it would be written into the frame header. We should support this use case in compresison.zstd.

I don't want to add a private API that is unsafe or only for advanced users, so I want to sketch out an implementation that could be used in general and catch incorrect usage:

  1. Update ZstdCompressor's struct to include two unsigned long long members current_frame_size and pledged_size, both initialized to ZSTD_CONTENTSIZE_UNKNOWN
  2. add set_pledged_size, the main difference from the pyzstd implementation is that it will update pledged_size
  3. modify ZstdCompressor's compress() and flush() to track how much data is being written to the compressor, written into current_frame_size. If the mode is FLUSH_FRAME then after writing, check that current_frame_size == pledged_size, otherwise raise a ZstdError to indicate the failure. Reset pledged_size and current_frame_size.

I think the one drawback of the above is it will notify the user if something goes wrong but if they are streaming compressed data elsewhere they could still send garbage if they use the API wrong. But that's inherently not something we can really fix.

An open question I have is should we check current_frame_size <= pledged_size at the end of writing when the mode isn't FLUSH_FRAME? I think probably yes?

cc @Rogdham, I'd be interested in your thoughts.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/pep-784-adding-zstandard-to-the-standard-library/87377/143

Linked PRs

Metadata

Metadata

Assignees

Labels

3.14bugs and security fixesextension-modulesC modules in the Modules dirstdlibPython modules in the Lib dirtype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions