Description
Feature or enhancement
Proposal:
pyzstd's ZstdCompressor class had a method _set_pledged_input_size
, which allowed users to set the amount of data they were going to write into a frame so it would be written into the frame header. We should support this use case in compresison.zstd
.
I don't want to add a private API that is unsafe or only for advanced users, so I want to sketch out an implementation that could be used in general and catch incorrect usage:
- Update ZstdCompressor's struct to include two
unsigned long long
memberscurrent_frame_size
andpledged_size
, both initialized toZSTD_CONTENTSIZE_UNKNOWN
- add
set_pledged_size
, the main difference from the pyzstd implementation is that it will updatepledged_size
- modify ZstdCompressor's
compress()
andflush()
to track how much data is being written to the compressor, written intocurrent_frame_size
. If the mode isFLUSH_FRAME
then after writing, check thatcurrent_frame_size == pledged_size
, otherwise raise aZstdError
to indicate the failure. Resetpledged_size
andcurrent_frame_size
.
I think the one drawback of the above is it will notify the user if something goes wrong but if they are streaming compressed data elsewhere they could still send garbage if they use the API wrong. But that's inherently not something we can really fix.
An open question I have is should we check current_frame_size <= pledged_size
at the end of writing when the mode isn't FLUSH_FRAME
? I think probably yes?
cc @Rogdham, I'd be interested in your thoughts.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/pep-784-adding-zstandard-to-the-standard-library/87377/143