Add lower-precision integer and floating point data types, and packbits codec #3

jbms · 2025-03-04T18:08:55Z

I intend to also register all of the other data types listed here:

8-bit floating point representations, parameterized by number of exponent and mantissa bits, as well as the bias (if any) and representability of infinity, NaN, and signed zero.

float8_e3m4
float8_e4m3
float8_e4m3b11fnuz
float8_e4m3fn
float8_e4m3fnuz
float8_e5m2
float8_e5m2fnuz
float8_e8m0fnu

Microscaling (MX) sub-byte floating point representations:

float4_e2m1fn
float6_e2m3fn
float6_e3m2fn

Narrow integer encodings:

int2
int4
uint2
uint4

Potentially I'll add the others to this PR, but I wanted to make sure the bfloat16 README was in order first.

There are a few questions I have:

Should they all be specified as independent documents, or should some be combined to a single document somehow?
Should a trivial schema just listing the data type name be provided?
I have a link to the main spec, which unfortunately includes "v3.0" which presumably will become stale at some point.
Data types interact with other things in the spec in a few ways:
- Fill values
- Codecs

Currently, for the core data types, we specify the fill value representation under the fill_value section, separately from where the data types themselves are defined.

We have to specify how the data type is handled by each codec that supports it. In the case of bfloat16 it is only the bytes codec. The bytes codec description itself specifies how it handles all of the core data types, but presumably we would instead specify that as part of the extension data type specification.

In the case of bfloat16, the bytes encoding is so obvious that it hardly requires any explanation at all. For the other data types listed above that are less than 1 byte, however, we have to say that they will be padded to 1 byte with the high bits ignored. Additionally, I may want to register a pack_bits codec in the future that would support bool as well as these other data types that are less than 1 byte.

If I am the maintainer of all of the relevant extensions then there is no issue since I can modify them to reference each other as needed, but if I were not, it is less clear how we would deal with this.

normanrz · 2025-03-04T20:00:20Z

There are a few questions I have:

Should they all be specified as independent documents, or should some be combined to a single document somehow?

The idea was to have one folder+readme per dtype. We have to see how well that scales over time.

Should a trivial schema just listing the data type name be provided?

That would be awesome. Strictly speaking, I think an object notation would also be valid:

{"name":"bfloat16"}

even with an empty configuration

{"name":"bfloat16", "configuration": {}}

I have a link to the main spec, which unfortunately includes "v3.0" which presumably will become stale at some point.

@joshmoore What do you think about that?

Data types interact with other things in the spec in a few ways:
Fill values
Codecs

Dtypes should define the acceptable values for their fill values.
The interaction with codecs needs a bit more spec work. We probably need to expect the bytes codec to be expanded to extension dtypes.

jbms · 2025-03-04T20:16:03Z

For now I can just specify in the data type specification how it interacts with the bytes codec, and then later update it when/if the pack_bits codec is added.

In principle we could have the situation where one person adds the int4 data type, and another person later adds the pack_bits codec but only mentions bool and not int4 --- and then a third person wants to make pack_bits work with int4.

…ts codec

jbms · 2025-03-05T01:27:12Z

I added all of the other data types, and also added the packbits codec.

jbms force-pushed the ml-dtypes branch from 896d87f to 18a8d53 Compare March 5, 2025 01:12

Add lower-precision integer and floating point data types, and packbi…

b3ed3f1

…ts codec

jbms force-pushed the ml-dtypes branch from 18a8d53 to b3ed3f1 Compare March 5, 2025 01:26

jbms changed the title ~~bfloat16 data type~~ Add lower-precision integer and floating point data types, and packbits codec Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lower-precision integer and floating point data types, and packbits codec #3

Add lower-precision integer and floating point data types, and packbits codec #3

jbms commented Mar 4, 2025

normanrz commented Mar 4, 2025

jbms commented Mar 4, 2025

jbms commented Mar 5, 2025

Add lower-precision integer and floating point data types, and packbits codec #3

Are you sure you want to change the base?

Add lower-precision integer and floating point data types, and packbits codec #3

Conversation

jbms commented Mar 4, 2025

normanrz commented Mar 4, 2025

jbms commented Mar 4, 2025

jbms commented Mar 5, 2025