-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds codecs that numcodecs defines #2
base: main
Are you sure you want to change the base?
Conversation
normanrz
commented
Feb 24, 2025
•
edited
Loading
edited
- Blosc
- LZ4
- Zstd
- Zlib
- GZip
- BZ2
- LZMA
- Shuffle
- CRC32
- CRC32C
- Adler32
- Fletcher32
- JenkinsLookup3
- PCodec
- ZFPY
I validated the schema.jsons agains the numcodecs fixtures: # /// script
# dependencies = [ "jsonschema" ]
# ///
from jsonschema import validate
import json
from pathlib import Path
numcodecs_fixture_path = (
Path.home() / "numcodecs" / "fixture"
)
for path in Path("codecs").glob("numcodecs.*/schema.json"):
_, name = path.parent.name.split(".")
print(name)
for fixture_path in (numcodecs_fixture_path / name).glob("**/config.json"):
print(" ", fixture_path)
config_json = json.loads(fixture_path.read_text())
config_json.pop("id", None)
config_json = {"name": f"numcodecs.{name}", "configuration": config_json}
validate(
instance=config_json,
schema=json.loads(path.read_bytes()),
) |
Is there a reason to duplicate codecs that are already listed elsewhere in this repo, e.g. gzip, zstd, blosc? Also, many of these leave important details of the encoded format unspecified, meaning the actual specification is the numcodecs source code. I'm not sure if it is intended that names can be registered without a proper specification other than a reference to the source code. But even if it is allowed, surely it should be discouraged and these initial ones should include a proper specification. |
Well, right now numcodecs uses the
I agree and would welcome contributions. Unfortunately, the numcodecs documentation is also pretty sparse on encoding details. So, for every codec we need to go through the code and write a spec. |
I see --- I did not realize that zarr-python had added all of the numcodecs codecs for zarr v3 as I imagine it was done to make it very easy for someone using zarr-python to migrate to using zarr v3 -- which is understandable. However, from an interoperability perspective this is kind of unfortunate --- someone using zarr-python with zarr v3 and a |