Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with get_all_available_embeddings #1283

Closed
huguesva opened this issue Sep 20, 2024 · 6 comments · Fixed by #1287
Closed

Bug with get_all_available_embeddings #1283

huguesva opened this issue Sep 20, 2024 · 6 comments · Fixed by #1287
Labels
bug Something isn't working

Comments

@huguesva
Copy link

Hello, thanks for the great repo!

I have this bug when trying to reproduce https://chanzuckerberg.github.io/cellxgene-census/notebooks/api_demo/census_embedding.html

Describe the bug

I have the bug :

No module named 'tiledb.vector_search'

To Reproduce

from cellxgene_census.experimental import get_all_available_embeddings

Environment

Name: cellxgene-census
Version: 1.16.1

Name: tiledb
Version: 0.32.0

@huguesva huguesva added the bug Something isn't working label Sep 20, 2024
@ebezzi
Copy link
Member

ebezzi commented Sep 20, 2024

Hey @huguesva ,

thanks for reporting the issue. This is a newly introduced dependency, so if you run pip install -U "cellxgene-census[experimental]" it will fix the issue.

That said, in a future version we will move the import so that the dependency can be optional.

@huguesva
Copy link
Author

great ! thanks a lot.

Quick follow up: is there a way to select the scGPT embeddings in the cellxgene_census.get_obs function ?
get_anndata seems to have this option but not get_obs .

@ebezzi
Copy link
Member

ebezzi commented Sep 23, 2024

get_obs only returns the obs dataframe - are you looking for a way to retrieve the embeddings without loading the full anndata?

@huguesva
Copy link
Author

yes exactly !

I am looking for the API to retrieve embeddings and labels (as numpy arrays or torch tensors for instance) given all the parameters that identify it : organism, measurement_name, tissue, emb_names ("scgpt" or "geneformer" or "scvi") ...

I am not sure what to use between get_obs, get_var, or get_anndata. get_anndata seems to have more parameters than the other two.

Thanks a lot for your help !

@ivirshup
Copy link
Collaborator

@huguesva, while get_anndata retrieves the whole anndata object, get_obs and get_var only retrieve the adata.obs and adata.var fields specifically.

@huguesva
Copy link
Author

thanks for the answer.
To be more precise, I think what perturbs me from this tutorial : https://chanzuckerberg.github.io/cellxgene-census/notebooks/api_demo/census_access_maintained_embeddings.html
is that at the cell [1] of the notebook, get_anndata is used with a specified obs_embeddings = ["scvi", "geneformer"] whereas later when get_obs is used (cell 12) there is no obs_embeddingsspecified so I have the impression that we don't know if the data is from "scvi" or "geneformer" or "scgpt" when using get_obs.

Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants