Add lower-precision integer and floating point data types, and packbits codec #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I intend to also register all of the other data types listed here:
https://pypi.org/project/ml-dtypes/
8-bit floating point representations, parameterized by number of exponent and mantissa bits, as well as the bias (if any) and representability of infinity, NaN, and signed zero.
float8_e3m4
float8_e4m3
float8_e4m3b11fnuz
float8_e4m3fn
float8_e4m3fnuz
float8_e5m2
float8_e5m2fnuz
float8_e8m0fnu
Microscaling (MX) sub-byte floating point representations:
float4_e2m1fn
float6_e2m3fn
float6_e3m2fn
Narrow integer encodings:
int2
int4
uint2
uint4
Potentially I'll add the others to this PR, but I wanted to make sure the bfloat16 README was in order first.
There are a few questions I have:
Currently, for the core data types, we specify the fill value representation under the
fill_value
section, separately from where the data types themselves are defined.We have to specify how the data type is handled by each codec that supports it. In the case of bfloat16 it is only the
bytes
codec. Thebytes
codec description itself specifies how it handles all of the core data types, but presumably we would instead specify that as part of the extension data type specification.In the case of bfloat16, the
bytes
encoding is so obvious that it hardly requires any explanation at all. For the other data types listed above that are less than 1 byte, however, we have to say that they will be padded to 1 byte with the high bits ignored. Additionally, I may want to register apack_bits
codec in the future that would supportbool
as well as these other data types that are less than 1 byte.If I am the maintainer of all of the relevant extensions then there is no issue since I can modify them to reference each other as needed, but if I were not, it is less clear how we would deal with this.