Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate citation metadata #525

Open
4 tasks
stellaprins opened this issue Mar 11, 2025 · 1 comment
Open
4 tasks

Validate citation metadata #525

stellaprins opened this issue Mar 11, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@stellaprins
Copy link
Contributor

stellaprins commented Mar 11, 2025

The citation metadata does not have the right format for the nadkarni_mri_mouselemur_91um atlas.

from brainglobe_atlasapi.bg_atlas import BrainGlobeAtlas
atlas = BrainGlobeAtlas("nadkarni_mri_mouselemur_91um")
print(atlas)

Results in a ValueError because citation metadata of this particular atlas doesn't have expected format (str instead of str, str).

The citation metadata for this atlas is "https://doi.org/10.1016/j.dib.2018.10.067", while I understood from @alessandrofelder it should follow a strict format and be something like "Natkarni et al., 2019", "https://doi.org/10.1016/j.dib.2018.10.067".

@alessandrofelder and myself discussed taking the following steps:

  • add metadata format validation to brainglobe_atlasapi\atlas_generation\validate_atlases.py
  • identify atlasses with missing metadata
  • go through the atlas generation scripts in brainglobe_atlasapi\atlas_generation\atlas_scripts and add missing metadata
  • regenerate atlasses with the correct metadata
@stellaprins stellaprins added the bug Something isn't working label Mar 11, 2025
@stellaprins stellaprins self-assigned this Mar 11, 2025
@stellaprins stellaprins changed the title Citation metadata Validate citation metadata Mar 12, 2025
@stellaprins
Copy link
Contributor Author

stellaprins commented Mar 12, 2025

Problem with metadata['citation']

A single string "{citation}, {doi}" (eg. "Brainglobe et al. 2025, https://doi.org/BrainGlobe/999" is currently the correct way to define the citation metadata according to `METADATA_TEMPLATE``.

_rich_atlas_metadata does not process the citation metadata correctly in case there is no comma seperating {citation} and {doi}, and / or when there are commas situation in locations other than between {citation} and {doi}.

So for nadkarni_mri_mouselemur_91um where the citation metadata is a single string {doi} and for allen_mouse_bluebrain_barrels_10um where there are multiple commas in {citation}, the citation metadata is not correctly processed.

Metadata template

I want to use the METADATA_TEMPLATE frombrainglobe_atlasapi\descriptors.py for validation of the metadata value types.

    for key, value in METADATA_TEMPLATE.items():
        assert key in atlas.metadata, f"Missing key: {key}"
        assert isinstance(atlas.metadata[key], type(value)), (
            f"Key '{key}' should be of type {type(value).__name__}, "
            f"but got {type(atlas.metadata[key]).__name__}."
        )

However there are several isseus with the METADATA_TEMPLATE. The name value contains charachters that are not allowed (see #524), and the resolution and shape values are tuples while they are lists for the atlasses I've checked.

METADATA_TEMPLATE = {
    "name": "name/author/institute_species_[optionalspecs]",
    "citation": "Someone et al 2020, https://doi.org/somedoi",
    "atlas_link": "http://www.example.com",
    "species": "Gen species",
    "symmetric": False,
    "resolution": (1.0, 1.0, 1.0),
    "orientation": "asr",
    "shape": (100, 50, 100),
    "version": "0.0",
    "additional_references": [],
}

I suggest changing it to something like this:

METADATA_TEMPLATE = {
    "name": "source_species_additional-info",
    "citation": {"Someone et. al., 2020": "https://doi.org/somedoi"},
    "atlas_link": "http://www.template_brain_atlas_link.com",
    "species": "animal",
    "symmetric": False,
    "resolution": [1.0, 1.0, 1.0],
    "orientation": "asr",
    "shape": [100, 50, 100],
    "version": "0.0",
    ~~"additional_references": {"ref1":"doi1","ref2":"doi2"},~~
     "additional_references": [],
}

Here all references / citations are dictionaries with {"ref":"doi_url"} pairs, the name value matches the stricter format, and the tuples are made into lists to match what I've seen in the actual atlasses. Any comments or suggestions are welcome (@alessandrofelder and @adamltyson).

note: maybe I do not understand additional_references very well, they seem more like keywords.

Another, maybe easier (?), option is to keep the citation a single string and assume the doi always starts with "https://doi" (which is the currently accepted standard).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant