-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AUTO: update migrate.py schema_version 5.2.3->5.3.0 #1292
base: main
Are you sure you want to change the base?
Conversation
Adding Public and Private specific dataset updates, as well as CSR matrix checking according to single-cell-curation issue 1023.
Non_csr_list contains dataset_ids of datasets that have at least one non-csr matrix.
Adding Non_CSR_matrix check for checking sparsity of non csr matrices in migrate.py
|
||
# fmt: off | ||
# ONTOLOGY TERMS TO UPDATE ACROSS ALL DATASETS IN CORPUS | ||
# Initialization is AUTOMATED for newly deprecated terms that have 'Replaced By' terms in their ontology files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these types of comments so this file can remain as a template for future migrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jahilton I'll see if I can update the generator to keep the comments.
import anndata as ad | ||
|
||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can avoid pandas (below)
}, | ||
"development_stage": DEV_STAGE_AUTO_MIGRATE_MAP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep as empty dict like the other fields for future use
] | ||
|
||
# Dictionary for CURATOR-DEFINED remapping of deprecated feature IDs, if any, to new feature IDs. | ||
GENCODE_MAPPER = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep
|
||
# Dictionary for CURATOR-DEFINED remapping of deprecated feature IDs, if any, to new feature IDs. | ||
GENCODE_MAPPER = {} | ||
df = pd.read_csv('migrate_files/non_csr_list.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a csv as it's a single columns. So remove the column header, and you can just read into a list without going through pandas
# utils.replace_ontology_term(df, <ontology_name>, {"term_to_replace": "replacement_term", ...}) | ||
# elif collection_id == "<collection_2_id>": | ||
# <custom transformation logic beyond scope of replace_ontology_term> | ||
# ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep for future
|
||
dataset.var.drop(columns="feature_type", inplace=True) | ||
|
||
if GENCODE_MAPPER: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep for future
dataset = utils.remap_deprecated_features(adata=dataset, remapped_features=GENCODE_MAPPER) | ||
|
||
# AUTOMATED, DO NOT CHANGE -- IF GENCODE UPDATED, DEPRECATED FEATURE FILTERING ALGORITHM WILL GO HERE. | ||
if DEPRECATED_FEATURE_IDS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep for future
This is an automated PR to update migrate.py from schema_version 5.2.3->5.3.0