Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with get_item() #125

Closed
mdsumner opened this issue Jan 24, 2025 · 7 comments · Fixed by #126
Closed

problem with get_item() #125

mdsumner opened this issue Jan 24, 2025 · 7 comments · Fixed by #126
Labels
bug Something isn't working

Comments

@mdsumner
Copy link

mdsumner commented Jan 24, 2025

Describe the bug

I see this error with this remote store, it's exactly the same error with the store locally.

remote <- "https://projects.pawsey.org.au/ideatest/oisst-tail-v2.zarr"
store <- HttpStore$new(remote)
g <- zarr_open_group(store)

## check the item is there
g$contains_item("sst")
# [1] TRUE

## but try to get and 
g$get_item("sst")

#Error in initialize(...) : 
#  unused arguments (check = -1, filters = NULL, preset = NULL)

Any ideas? I can't find those unused arguments in the source here.

I see same issue on Linux and windows. Thanks for consideration! I have the code I used to generate the zarr below.

R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] crul_1.5.0    pizzarr_0.1.0

loaded via a namespace (and not attached):
 [1] R6_2.5.1            fastmap_1.2.0       httpcode_0.3.0
 [4] magrittr_2.0.3      glue_1.8.0          cachem_1.1.0
 [7] remotes_2.5.0.9000  urltools_1.7.3      stringr_1.5.1
[10] RApiSerialize_0.1.4 memoise_2.0.1       RcppParallel_5.1.9
[13] lifecycle_1.0.4     cli_3.6.3           vctrs_0.6.5
[16] compiler_4.4.2      stringfish_0.16.0   tools_4.4.2
[19] curl_6.1.0          Rcpp_1.0.14         jsonlite_1.8.9
[22] rlang_1.1.5         triebeard_0.4.1     stringi_1.8.4
[25] qs_0.27.2

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

time zone: Australia/Hobart
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pizzarr_0.1.0

loaded via a namespace (and not attached):
 [1] R6_2.5.1            fastmap_1.2.0       httpcode_0.3.0     
 [4] magrittr_2.0.3      crul_1.5.0          cachem_1.1.0       
 [7] glue_1.8.0          urltools_1.7.3      stringr_1.5.1      
[10] RApiSerialize_0.1.4 parallel_4.4.2      memoise_2.0.1      
[13] RcppParallel_5.1.9  lifecycle_1.0.4     cli_3.6.3          
[16] vctrs_0.6.5         compiler_4.4.2      stringfish_0.16.0  
[19] tools_4.4.2         curl_6.1.0          Rcpp_1.0.14        
[22] jsonlite_1.8.9      rlang_1.1.5         triebeard_0.4.1    
[25] stringi_1.8.4       qs_0.27.2  

Generate zarr from netcdfs "ENDPOINT=https://projects.pawsey.org.au"

f = ["s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250113_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250114_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250115_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250116_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250117_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250118_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250119_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250120_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250121_preliminary.nc", 
"s3://idea-10.7289-v5sq8xb5/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202501/oisst-avhrr-v02r01.20250122_preliminary.nc"
]

import xarray
ds = xarray.open_mfdataset(f, engine = "h5netcdf")
import numcodecs
lzma = numcodecs.LZMA()


e = {"sst": {"compressor": lzma}, "ice": {"compressor": lzma}, "anom": {"compressor": lzma}, "err": {"compressor": lzma}}
ds.to_zarr("oisst-tail-v2.zarr", consolidated = True, zarr_format = 2, encoding = e)
@mdsumner mdsumner added the bug Something isn't working label Jan 24, 2025
@keller-mark
Copy link
Owner

keller-mark commented Jan 24, 2025

In this case, is https://projects.pawsey.org.au/ideatest/oisst-tail-v2.zarr/sst a Zarr Group? Does https://projects.pawsey.org.au/ideatest/oisst-tail-v2.zarr/sst/.zgroup group metadata file exist? Or is it just a directory?

This is potentially related to #72

@mdsumner
Copy link
Author

There's no .zgroup.

Is this a to_zarr problem? It's a basic netcdf with 4 arrays the same shape, no groups (or does group mean something different to Zarr than netcdf?)

How should I generate a Zarr with Python?

@dblodgett-usgs
Copy link
Collaborator

Just looking at a stack trace, the error is coming out of get_codec here:

result <- do.call(LzmaCodec$new, config)

I haven't learned how that stuff works yet so am not much help debugging. But I don't think it has to do with metadata or groups.

Here's the stack trace:

Error in `initialize()`:
! unused arguments (check = -1, filters = NULL, preset = NULL)
Hide Traceback
    ▆
 1. └─g$get_item("sst")
 2.   └─ZarrArray$new(...) at pizzarr/R/zarr-group.R:239:9
 3.     └─pizzarr (local) initialize(...)
 4.       └─private$load_metadata() at pizzarr/R/zarr-array.R:825:7
 5.         └─private$load_metadata_nosync() at pizzarr/R/zarr-array.R:140:7
 6.           └─pizzarr:::get_codec(meta$compressor) at pizzarr/R/zarr-array.R:120:9
 7.             ├─base::do.call(LzmaCodec$new, config) at pizzarr/R/numcodecs.R:612:7
 8.             └─R6 (local) `<fn>`(check = -1L, filters = NULL, format = 1L, preset = NULL)

config contains:

Browse[1]> meta$compressor
$check
[1] -1

$filters
NULL

$format
[1] 1

$id
[1] "lzma"

$preset
NULL

But this:

initialize = function(level = 9, format = 1) {

expects format and level

Presumably the other entries in that list should just be ignored in do.call?

@mdsumner
Copy link
Author

awesome thanks, I will certainly be exploring this more deeply, I definitely need to imbibe the zarr spec too

@dblodgett-usgs
Copy link
Collaborator

We need to get zarr v3 implemented in pizzarr still.

My colleague Mike Johnson has been making good progress with pizzarr and rnz -- I think you'll find it's pretty ok. Still rough around the edges, but very usable.

@mdsumner
Copy link
Author

Is there discussion of virtualization, the kerchunk/VirtualiZarr stuff? I don't know the spec enough to know if that's a requirement but it's very important.

I'm having convos about the Rust lib zarrs and various discussions where I mention this project and you (I will email you).

@keller-mark
Copy link
Owner

Is there discussion of virtualization, the kerchunk/VirtualiZarr stuff?

Do you mean you need support for a ReferenceStore? That could be implemented either in pizzarr or on top of it. this JS implementation may be helpful https://github.com/manzt/zarrita.js/blob/main/packages/storage/src/ref.ts#L42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants