New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

AUTO: update migrate.py schema_version 5.2.3->5.3.0 #1292

Open

github-actions wants to merge 11 commits into main from auto/update-convert-py-to-6.0.0

+203 −161

Contributor

github-actions bot commented Mar 11, 2025 •

edited by ejmolinelli

Loading

This is an automated PR to update migrate.py from schema_version 5.2.3->5.3.0

github-actions and others added 8 commits

March 11, 2025 12:44


          Bump version: 5.2.3 → 5.3.0-rc.0

09437b8


          Bump version: 5.3.0-rc.0 → 5.3.0

a4ede18


          Bump version: 5.3.0 → 6.0.0-rc.0

3f18d76


          Bump version: 6.0.0-rc.0 → 6.0.0

77fc7d0


          AUTO: update migrate.py schema_version 5.2.3->6.0.0

e06eedb


          Update .bumpversion.cfg

a60ed32


          Update __init__.py

aed9c7d


          Update setup.py

ef31980

ejmolinelli requested a review from joyceyan

March 11, 2025 14:24

ejmolinelli changed the title ~~AUTO: update migrate.py schema_version 5.2.3->6.0.0~~ AUTO: update migrate.py schema_version 5.2.3->5.3.0

joyceyan approved these changes

View reviewed changes

Jchaffer787 added 3 commits

March 12, 2025 10:35


          Update migrate.py

dbe1dab

Adding Public and Private specific dataset updates, as well as CSR matrix checking according to single-cell-curation issue 1023.


          Adding non_csr_list

31b9405

Non_csr_list contains dataset_ids of datasets that have at least one non-csr matrix.


          Update utils.py

81097a7

Adding Non_CSR_matrix check for checking sparsity of non csr matrices in migrate.py

jahilton requested changes

View reviewed changes

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
              # fmt: off

              # ONTOLOGY TERMS TO UPDATE ACROSS ALL DATASETS IN CORPUS

              # Initialization is AUTOMATED for newly deprecated terms that have 'Replaced By' terms in their ontology files

Collaborator

jahilton Mar 12, 2025

Let's keep these types of comments so this file can remain as a template for future migrations

Contributor

ejmolinelli Mar 12, 2025

@jahilton I'll see if I can update the generator to keep the comments.

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
              import anndata as ad

              import pandas as pd

Collaborator

jahilton Mar 12, 2025

Can avoid pandas (below)

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
                  },

                  "development_stage": DEV_STAGE_AUTO_MIGRATE_MAP,

Collaborator

jahilton Mar 12, 2025

Keep as empty dict like the other fields for future use

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
              ]

              # Dictionary for CURATOR-DEFINED remapping of deprecated feature IDs, if any, to new feature IDs.

              GENCODE_MAPPER = {}

Collaborator

jahilton Mar 12, 2025

Keep

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
              # Dictionary for CURATOR-DEFINED remapping of deprecated feature IDs, if any, to new feature IDs.

              GENCODE_MAPPER = {}

              df = pd.read_csv('migrate_files/non_csr_list.csv')

Collaborator

jahilton Mar 12, 2025

No need for a csv as it's a single columns. So remove the column header, and you can just read into a list without going through pandas

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
                  #   utils.replace_ontology_term(df, <ontology_name>, {"term_to_replace": "replacement_term", ...})

                  # elif collection_id == "<collection_2_id>":

                  #   <custom transformation logic beyond scope of replace_ontology_term>

                  # ...

Collaborator

jahilton Mar 12, 2025

keep for future

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
                      dataset.var.drop(columns="feature_type", inplace=True)

                  if GENCODE_MAPPER:

Collaborator

jahilton Mar 12, 2025

keep for future

cellxgene_schema_cli/cellxgene_schema/migrate.py

    
                      dataset = utils.remap_deprecated_features(adata=dataset, remapped_features=GENCODE_MAPPER)

                  # AUTOMATED, DO NOT CHANGE -- IF GENCODE UPDATED, DEPRECATED FEATURE FILTERING ALGORITHM WILL GO HERE.

                  if DEPRECATED_FEATURE_IDS:

Collaborator

jahilton Mar 12, 2025

keep for future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet