-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mesh specification #33
Comments
This issue has been mentioned on Image.sc Forum. There might be relevant details there: |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/9 |
I'd argue for GEOjson for ROIs and points and such & keep meshes in their niche |
@glyg, so this block from ply-zarr is the critical bit for discussion?
|
Yes, this mirrors the specification for the PLY header, then it seems natural to store the faces in separate arrays according to their number of sides. |
see a more concrete example of mixing meshes, images and labels here I assume the xarray compatibility also applies here, I'll look into that next. |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/ngff-status-update-may-2021/52918/1 |
cc @normanrz |
We recently implemented the mesh format from Neuroglancer in webKnossos: https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/meshes.md It's been great for our purposes:
I think that format would be a great candidate to be adopted by OME-NGFF. |
@normanrz thanks for the input, those features indeed sound great (esp. multi-res!). If I understand correctly though, only triangular meshes are supported? The other consumer / producer of meshes is the modeling community (i.e. physical biology), who would need more generic meshes, for example with polygonal (>3) 2D cells, polyhedral 3D cells, or even quadratic tetrahedra. Would draco be able to handle that kind of data? Also, maybe storing generic FEM meshes is out of scope for ome-ngff and triangles are enough. |
Yes, I think draco only supports triangular meshes (and point clouds). We could look into allowing other encodings in addition to draco.
That is a good question that we haven't fully figured out yet. We currently store all the data in a single binary file. The file consists of a) a directory structure (hash map) to locate the meshfile for a specified id within b) a long blob of mesh data. In b) each meshfile has a binary metadata header that describes the available chunks and level-of-details. |
I would like to get involved in the discussion. I think it would be great to have a format similar to the Neuroglancer format in OME. Now the 3D data generation is getting more and more popular in the Spatial Biology field and segmentations are a big part of it. Having the possibility to, beside storing the volumetric (point cloud) data in OME Zarr it would be greally great to have the same possibility to do that for meshes. I am wondering if there would be the possibility to exchange about the format and specifications in a meeting or such? |
Consider yourself involved! 🙂
Modulo https://xkcd.com/927/ of course. This is certainly something that I've heard several times recently as well, but it will certainly take one or more champions for it to happen. Also cc @jbms for how he weighs the changes as well as the pros & cons.
Most of the recent meetings have been around the challenge which is pushing forward Zarr v3 support (i.e., RFC-2). It's certainly time for a next general community meeting, or alternatively, a smaller group could start socializing the idea in preparation for a RFC. |
I'm still here watching this thread, and would be happy to help get a small group discussing what the best options are for this! |
I see! I think there are similarities and differences between storing volumetric (point cloud data) and the meshes. One main similarity as introduced by the standard Neuroglancer uses is: Multi-Resolution support for meshes! This is really crucial for the vast amount of meshes we are gonna store and load again I think the main difference is that meshes don't adhere to such a nice Grid Structure as the point Clouds. So I am wondering how we can store them in their multi resolutions but still know where they are located in XYZ so we can efficiently load them when needed. So there might be more MetaData to know the Bounding Box, Centroid or other measures to know if a Mesh is visible in a certain location so we can define if it should be loaded by the client or not. Would really like to see a first Mesh Support (maybe based on the NeuroGlancer format supporting Draco) soon in Zarr |
What would meshes look like in the Zarr data model? Zarr v3 doesn't have support yet for variable length types, so at a minimum we would need to add that, and even then I'm not sure how meshes, expressed as variable-length collections of geometrical objects, would be stored in an N-dimensional array. What would the array indices mean? I suspect people would fall back to 1D arrays, with maybe a second array for storing a spatial index? It could work, but it's not a great fit for Zarr IMO. On the other hand, the neuroglancer multiresolution mesh format seems perfectly fine on its own, outside of Zarr. So maybe just refining or generalizing that format as needed would be simpler than forcing it into Zarr. |
I agree that the mesh format doesn't need to live in Zarr arrays. We could (mis)use uint8 arrays, to store the bytes, but I don't know what value that would bring in comparison to just storing the blob alongside the Zarr arrays in the hierarchy. |
So the idea would be to adopt the NeuroGlancer Format (https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/meshes.md#multi-resolution-mesh-format) and integrate it into OME-Zarr? |
I think that would be a good way forward. There are a few details in terms of metadata and file layout that need to be figured out. Would be great to hear @jbms feedback on this. |
A quick heads up that I heard from Jeremy today on a separate matter: he's been on leave. I very much assume when he's caught back up he'll chime in. |
I just want to get this discussion running again. What would be potential next steps? |
I think a meeting to sketch out an RFC would be a good next step. There should be an accompanying post on image.sc to announce that meeting. |
@normanrz I'm not sure how crystallized the schedule is for the upcoming OME-NGFF workflows hackathon, but maybe carving out some (EST-timezone-friendly) slots would be convenient? |
That sounds like a good plan to discuss in that timeframe! |
Sorry, was on paternity leave until today. As others have also stated, while meshes can be potentially thought of as collections of arrays of vertex properties and face properties, I think trying to represent them as zarr arrays directly would add a lot of complexity and not provide significant advantages, given how meshes are actually used in practice. There is certainly a lot of room for improvement in the Neuroglancer precomputed multiscale mesh format (and the related annotation format) but I think if the existing format serves a decent number of use cases then it may be wise to standardize it as-is initially, and then once there is greater usage experience work on a revised format. |
No worries! Yes! I think this sounds like a really good plan! I think there is also a great need for more standardized creation and retrieving pipelines for the format. So I like your suggestion of first taking it up as-is and gradually improving it over time. |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/late-announcement-technical-ngff-call-on-meshes/106304/1 |
Made use of the |
I'd like to second @d-v-b @jbms @normanrz from previous comments about not using zarr for mesh representation and supporting e.g. Draco as "the standard". What is not immediately clear to me when looking at e.g. DracoPy though is the degree it supports storing arbitrary numeric attributes on the vertices and faces in an efficient way, this issue may suggest perhaps not. Having a separate file for attributes, e.g. based on Parquet, to efficiently store/retrieve attribute values matched to indices may be useful. But it'd need to play nicely e.g. with the shaders in Neuroglancer to visualize the values in 3d. And is a form of sharding supported for storing multiple multi-resolution meshes? I'll try to join the call tomorrow. |
Draco does support storing arbitrary additional attributes though neuroglancer does not currently support that. To be clear, draco is just an encoding of an individual mesh fragment to be decoded in its entirety, think jpeg. The neuroglancer precomputed multiscale mesh format adds chunking and multi-resolution on top. |
Yes, I think the NeuroGlancer Precomputed Multi-Res Sharded Format is the way to go. It would benefit from better tool support and better documentation. We could think about using the IDs that index the meshes to link additional information to the meshes. I think NeuroGlancer has some sort of additional attribute display for segmentations (google/neuroglancer#430). |
The additional attributes are on an object like, i.e. for segmentation objects and their associated meshes. What I meant are attributes per mesh vertex/face. Good to know that draco can support that. I'm sure there will be many use cases one can think of, and having tools to visualize is crucial. Vedo is nice Python library for "small" scale use cases. It may already exist, but it would be useful to have an easy way (e.g. simple Python function) to go from a set of mesh files to NG Precomputed Multi-Res Sharded Format, perhaps with the option for downscaling/simplification levels and specification of some affine transformation. I know igneous can write it, but it's specifically designed for large-scale data use cases. Also it may be useful to discuss, in the context of OME-Zarr, how transformations of meshes (i.e. the transformation-spec) could be used with or without associated image arrays. |
For note taking during the calls: https://hackmd.io/@kisharrington/SJBT58eHke |
@jbms TL;DR for you from the meeting notes: One thing that might inform your response: at the end of the meeting notes, @bogovicj pointed out some specific technical questions that would need to be addressed (where/how transforms are stored relative to current NGFF approach, and axis names/order). |
Unfortunately I cannot join the call today due to sickness. Will follow the outcomes via the notes and summaries. Thank you for the discussion and effort to get meshes supported 🙏 |
We had 2 great meetings. Notes here. Some notes:
Some considerations:
Some goals of the first effort:
|
Okey doke, I tossed together a first draft of the RFC: https://github.com/kephale/ngff/blob/mesh/rfc/8/index.md I want to start playing around with an implementation before getting too carried away with the text. [Edited: update link for renumbering RFC] |
Thanks for putting that together. I have a few comments:
|
Aha, good point about the naming. I swapped the json and node_type name to zarr.json and @jbms the directory structure is written naively tbh. I'm happy to adjust back, but I was going to update after testing implementation. Incidentally, I tried zmesh outputs to ng precomputed but it doesn't seem to match the spec (afaict it just outputs the fragments): https://github.com/kephale/atrium.kyleharrington.com/blob/main/meshes/zmesh-ng-precomputed/0.0.1.py I'll play [with] other writers soon. |
Yes, zmesh only seems to support the older single-resolution neuroglancer precomputed mesh format and indeed seems to also skip writing the manifests. |
Thanks @kephale for writing this up! |
In which case, we might switch the metadata file back to @normanrz we'll also be able to piggy back on the coordinate transforms support from Collections as well, right? |
I suppose there are a lot of options for the metadata. In general people may wish to store non-zarr data within a zarr hierarchy so a general solution to that may be useful. Several possible solutions come to mind:
Alternatively, now that implicit groups are disallowed, we could say that any directory that lacks a zarr.json is assumed to be unknown "external" data. However, if creating a new group gets interrupted it would appear as an "external" node which may be problematic. |
nods or "folder" / "other" etc. might all specify this is a directory in the hierarchy that shouldn't be parsed but it still has metadata in it, but (sidenote) that discussion we should really have with the wider Zarr community. |
I was wondering if you had thoughts on using sqlite as a container for the mesh manifest and fragment files instead of the There are some examples where read was data from a remotely stored sqlite database 1 2 3. An alternative might be DuckDB. |
I agree that the non-standard neuroglancer precomputed sharded format is not ideal. However, for scalability what is needed here is a "cloud-native" key-value store with the following properties:
There were some of the main design criteria for OCDBT, which I think would be a good candidate for use instead of the precomputed sharded format. There are a few problems with sqlite:
I don't have much of any experience with parquet (which I believe is what duckdb uses). Based on my limited understanding I think in principle parquet could work as a "cloud-native" key-value store, but as I don't think that is the normal/intended use case, I am not sure exactly how practical it would be, e.g. how readily existing tools could be used to write the format correctly to enable efficient random access to individual keys, and how many sequential read requests would be needed to read an individual key. For comparison, with the current mesh format, we have (independent of the number of objects stored): 1 read request for json metadata specifying sharding parameters, but this also provides necessary mesh format metadata Thus a total of 5 sequential requests required to download the mesh. With OCDBT, we have: 1 read request for manifest.ocdbt Probably in most cases there would be at most 1 additional b+tree node required after the root b+tree node. |
For added context: https://icechunk.io/faq/ |
@unidesigner I'm guessing you referenced that FAQ for the general comments about various storage formats, not specifically to suggest icechunk as a possible container format. However, mesh data is something that someone might potentially wish to store alongside array data inside an icechunk repository. However, because icechunk currently only supports data that "looks like" zarr v3, we would have to munge the keys and create fake zarr array metadata in order to actually store it in icechunk. Additionally, because icechunk has just a single level index of chunk keys (similar to OCDBT with a height of 0), it would not work well if there are a large number of objects. It seems plausible that the icechunk team may be interested in supporting mesh storage, though, in which case they might extend the format to more readily support storing non-zarr data. |
@jbms Yes, it was meant for general overview of format options, and it includes Tensorstore with ocdbt at the end. I did not want to suggest to use icechunk for mesh storage. The association of mesh data with array data, as well as other derived data types (skeletons, object features etc.) is a very common usage case and important to think about. Assocating objects in datasets of different type based on an object ID scoped e.g. to a yet-to-defined collection node in the metadata schema would be convenient (i.e. not necessarily required to be a subfolder within a zarr folder hierachy). Can you explain again what you mean by icechunk with only a single level index not working well for large number of objects? Do you mean that as the number of objects or keys increases, the lookups will take progressively longer without an additional lookup tree, which would be solved using OCDBT? As an additional remark if the plan is to come up with a better scalable mesh format to also consider efficient storage of additional features on vertices/faces, for static meshes with static and time-dependent features, and perhaps also for time-dependent meshes (where number of vertices and topology may change) and associated features. A use case I once had was coloring synaptic contact areas on neuronal meshes. It would have been nice if this could be supported one day in neuroglancer at scale. And of course the evolution of electric membrane potential across a neuronal mesh in simulations. :) |
Regarding icechunk and a single level index not working well for a large number of objects, icechunk currently stores a list of ManifestRef for each array, where a ManifestRef specifies the location of a chunk manifest and the lexicographical min/max chunk indices for the chunks it contains. Because of the min/max bounds this could be seen as a two-level index, because you could evenly partition n keys into sqrt(n) separate manifests by chunk indices order, i.e. similar to ocdbt with height 1. However, icechunk currently does not appear to use the min/max information and instead always fetches all of the manifests when reading. That means if you have a large number of chunks you have to read a large amount of metadata before you can access the data. |
Sorry about the delay, I wanted to share out the RFC update after getting more done on the prototype but probably should have just posted here first. Anywho, I updated the RFC to (hopefully? @normanrz) align with Collections: https://github.com/kephale/ngff/blob/mesh/rfc/8/index.md |
I had this tab open for a few weeks and now have some time to respond. Thanks for putting this together and for incorporating our collection ideas. Just as a caveat, the collections proposal is still super early, so it might change a lot over the next couple of months. I'll keep you updated. |
Awesome, thank you @normanrz. I didn't think this was final and the sharded bit can be cleared up. The more I looked into Collections the more it made sense to build on, but is there something I can do to help nudge the Collections RFC along? Since it is so early as you say, it is hard for me to wrap my head around. My goal here was to "rebase" the mesh RFC onto Collections, get some ideas, but then switch back to poking at an actual proof of concept implementation (presumably regardless of the specific RFC most of the code will be reusable). I have something that is close to working that I'll share out the next chance I have to clean it up! |
Thanks! The work on collections is still happening in the hackmd. Right now, my focus is on getting extensions in Zarr to work. Once that is on its way, I will shift my attention to collections again. |
I think that it would make sense to specify meshes independently of collections, and have their own standalone metadata in some form, just as we can specify arrays and multiscale arrays independent of collections, but then collections can refer to meshes just as they can refer to other things. Having the meshes stored somewhere on disk without any metadata would be unfortunate, I think. However, if you are reluctant to define a standalone metadata format for meshes, we could just use the existing precomputed mesh format exactly, including its |
As discussed in the feb. 2021 ngff community call, and following this image.sc thread
The idea is to follow PLY specification to store meshes in ome-zarr. A ply file is organised in:
There is a draft implementation here: https://github.com/centuri-engineering/ply-zarr
Some questions:
The text was updated successfully, but these errors were encountered: