Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Varints and / or Simple Packing / Compression For Leaf Components #10

Open
zicklag opened this issue Aug 21, 2024 · 3 comments
Labels
leaf Related to the Leaf Protocol specification

Comments

@zicklag
Copy link
Collaborator

zicklag commented Aug 21, 2024

The only reason so far I'm not sure that Borsh is perfect for Leaf is that it doesn't have any sort of compression for small integers.

If we want to keep the canonical property, then adding any sort of varint encoding or packing would require making it mandatory, or else a new data type. I lean towards making it mandatory if we're going to use it, as an extra data type, just for the sake of encoding/decoding/storage efficiency doesn't match the semantic nature of Leaf schemas very well.

Looking into varint encodings, while they do take less space, they can have a negative impact on performance:

Note that varint encoding saves space on the wire but is actually relatively slow, because it requires a lot of branches to encode/decode. It's not as slow as text encoding, of course, but as binary formats go it's not so great. This is one reason why Cap'n Proto decided to reverse this decision and just put fixed-width ints on the wire. Cap'n Proto also includes an optional "packing" algorithm which compresses away zero-valued bytes, which produces similar message sizes to Protobuf but is generally faster because the algorithm uses less branching.

https://stackoverflow.com/questions/24614553/why-is-varint-an-efficient-data-representation

I also found the Cap'nProto packing algorithm: https://capnproto.org/encoding.html#packing

That could be useful. Looks like it's designed to be extremely efficient to implement in a native language, but not sure if it'd slow things down in JS or something like that.

For now we are more concerned with a working prototype, but this is something we should think about and carefully weigh the pros and cons of before a Leaf 1.0.

One advantage of not packing is that Borsh is very simple to implement right now. Packing adds an extra complication, but it's still quite simple, so I'm not too worried about it.

It seems like the storage savings we could get from packing might be worth it, when we want to work well on resource constrained and low bandwidth devices. It might save a lot of storage/network costs over the long term.

@zicklag
Copy link
Collaborator Author

zicklag commented Oct 16, 2024

Polkadot's SCALE encoding is really similar to Borsh, but it also has compact integers. I think we need to do some serious thinking about whether Borsh or SCALE makes more sense for leaf.

Edit: Credit to @AireshBhat for pointing out SCALE to me.

@anderspitman
Copy link

Another option to check out would be QUIC's int packing. It's very simple.

@zicklag
Copy link
Collaborator Author

zicklag commented Oct 16, 2024

Another option to check out would be QUIC's int packing. It's very simple.

Interesting!

Looking at the QUIC int packing algorithm seems great for speed and storage efficiency, but only allows encoding up to 62 bits, which is a problem if you need a full 64 bits unfortunately.


A little research later...

Ah! The SCALE encoding for compact integers is almost identical to the encoding for packed QUIC integers, except that it allows you to expand beyond 62 bits for very large integers.
I'm pretty happy with that.

The only concerns with compact integers were performance and otherwise confusion with the schema system, but I think we'll be fine for both of those, especially if QUIC is doing something very similar for integer packing, and parity did design SCALE to be efficient for low-end devices, so we're probably fine.

It looks like there might be nicer TypeScript libraries with type inference for SCALE, too, which could be nice if we don't have to make our own.

@zicklag zicklag added the leaf Related to the Leaf Protocol specification label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaf Related to the Leaf Protocol specification
Projects
None yet
Development

No branches or pull requests

2 participants