Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restore chromosome sequence lengths or other metadata in BSgenome.Mfascicularis.NCBI.6.0 #82

Open
zgb963 opened this issue Feb 7, 2025 · 1 comment

Comments

@zgb963
Copy link

zgb963 commented Feb 7, 2025

Hello,

I was using BSgenome.Mfascicularis.NCBI.6.0 genome for the Signac multiomics tutorial. It was working fine until the I was trying to change the name of the chromosomes to match the GTF I have (macFas6-hg38-gencode.v47.basic.sortedgtf).

This code worked before

mf_genome <- BSgenome.Mfascicularis.NCBI.6.0

# see 1st chromosomes
head(seqlengths(mf_genome))

# see how many base pairs are in chromosome 1
mf_genome[["1"]]

# chromosome 1 had 223606306 base pairs. 
# if you look at the chromsosomes table on bottom of page, chromsome 1 is 'CM021939.1' and it has 223606306 base pairs (no longer working???)
# also lists the gc content % for each chromsome

But then when I tried to change the chromosome names...

# # Store mapping in R as vector
# chr_map <- c(
#   "1" = "CM021939.1", "2" = "CM021940.1", "3" = "CM021941.1",
#   "4" = "CM021942.1", "5" = "CM021943.1", "6" = "CM021944.1",
#   "7" = "CM021945.1", "8" = "CM021946.1", "9" = "CM021947.1",
#   "10" = "CM021948.1", "11" = "CM021949.1", "12" = "CM021950.1",
#   "13" = "CM021951.1", "14" = "CM021952.1", "15" = "CM021953.1",
#   "16" = "CM021954.1", "17" = "CM021955.1", "18" = "CM021956.1",
#   "19" = "CM021957.1", "20" = "CM021958.1", "X" = "CM021959.1"
# )
#
#
# # Get the original chromosome names
# original_chr_names <- seqnames(BSgenome.Mfascicularis.NCBI.6.0)
# original_chr_names
#  # [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "X"
#
#
#
# # Check which names need to be changed (need to remove NA's from here --- rest of the scaffolds that aren't chromosome 1-20 or X)
# new_chr_names <- chr_map[original_chr_names]
# new_chr_names
#
# #            1            2            3            4            5            6            7            8            9           10           11           12           13
# # "CM021939.1" "CM021940.1" "CM021941.1" "CM021942.1" "CM021943.1" "CM021944.1" "CM021945.1" "CM021946.1" "CM021947.1" "CM021948.1" "CM021949.1" "CM021950.1" "CM021951.1"
# #           14           15           16           17           18           19           20            X
# # "CM021952.1" "CM021953.1" "CM021954.1" "CM021955.1" "CM021956.1" "CM021957.1" "CM021958.1" "CM021959.1"
#
#
#
#
# # Keep only non-NA values ( chrom 1-20 & X)
# # new_chr_names <- new_chr_names[!is.na(new_chr_names)]
#
#
# # # Remove NA values to ensure lengths match
# # valid_indices <- !is.na(new_chr_names)
# # original_chr_names <- original_chr_names[valid_indices]
# # new_chr_names <- new_chr_names[valid_indices]
# #
# # # Confirm lengths match
# # length(original_chr_names) == length(new_chr_names)
#
#
# # Assign the new chromosome names
# names(BSgenome.Mfascicularis.NCBI.6.0@seqinfo) <- new_chr_names
#
# # Error in `seqnames<-`(x, value) :
# #   length of supplied 'seqnames' vector must equal the number of sequences
#
#
#
#
# # Verify the update
# seqnames(BSgenome.Mfascicularis.NCBI.6.0)

   1                                        2                                        3                                        4 
                                      NA                                       NA                                       NA                                       NA 
                                       5                                        6                                        7                                        8 
                                      NA                                       NA                                       NA                                       NA 
                                       9                                       10                                       11                                       12 
                                      NA                                       NA                                       NA                                       NA 
                                      13                                       14                                       15                                       16 
                                      NA                                       NA                                       NA                                       NA 
                                      17                                       18                                       19                                       20 
                                      NA                                       NA                                       NA                                       NA 
                                       X                    Super-Scaffold_100058                    Super-Scaffold_100060                    Super-Scaffold_100064 
                                      NA                                       NA                                       NA                                       NA 
                   Super-Scaffold_100065                    Super-Scaffold_100066                    Super-Scaffold_100068                    Super-Scaffold_100069 
                                      NA                                       NA                                       NA                                       NA 
                   Super-Scaffold_100072                    Super-Scaffold_100073                    Super-Scaffold_100078                    Super-Scaffold_100080 
                                      NA                                       NA                                       NA                                       NA 

It removed all the other metadata. I've tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 & BSgenome but it doesn't recover the seqlengths or the other metadata that was originally there

remove.packages("BSgenome.Mfascicularis.NCBI.6.0")
remove.packages("BSgenome")

BiocManager::install("BSgenome")
BiocManager::install("BSgenome.Mfascicularis.NCBI.6.0")
 print(names(BSgenome.Mfascicularis.NCBI.6.0@seqinfo))
 [1] "CM021939.1" "CM021940.1" "CM021941.1" "CM021942.1" "CM021943.1" "CM021944.1" "CM021945.1" "CM021946.1" "CM021947.1" "CM021948.1" "CM021949.1" "CM021950.1" "CM021951.1"
[14] "CM021952.1" "CM021953.1" "CM021954.1" "CM021955.1" "CM021956.1" "CM021957.1" "CM021958.1" "CM021959.1"

I tried this too linked here. The metadata that was originally in BSgenome.Mfascicularis.NCBI.6.0 is no longer there and I can't restore it? I succesfully changed the chromosome names I needed and created a subsetted BS genome object called new_genome but again I no longer have seqlength info, etc

# hack from https://support.bioconductor.org/p/83588/  to keep only some chromosomes within a BSgenome object.
keepBSgenomeSequences <- function(genome, seqnames)
{
  stopifnot(all(seqnames %in% seqnames(genome)))
  genome@user_seqnames <- setNames(seqnames, seqnames)
  genome@seqinfo <- genome@seqinfo[seqnames]
  genome
}
# load macfas6 NCBI BSgenome object
library(BSgenome.Mfascicularis.NCBI.6.0)
# assign to R object called genome
genome <- BSgenome.Mfascicularis.NCBI.6.0
# check if seqlengths info is in original genome
head(seqlengths(genome))
#  1  2  3  4  5  6
# NA NA NA NA NA NA
# see how many base pairs are in chromosome 1
genome[["1"]]
# Error in if (length(ans) != seqlengths(x)[[user_seqname]]) { :
#   missing value where TRUE/FALSE needed
# # Store mapping in R as vector
chr_map <- c(
  "1" = "CM021939.1", "2" = "CM021940.1", "3" = "CM021941.1",
  "4" = "CM021942.1", "5" = "CM021943.1", "6" = "CM021944.1",
  "7" = "CM021945.1", "8" = "CM021946.1", "9" = "CM021947.1",
  "10" = "CM021948.1", "11" = "CM021949.1", "12" = "CM021950.1",
  "13" = "CM021951.1", "14" = "CM021952.1", "15" = "CM021953.1",
  "16" = "CM021954.1", "17" = "CM021955.1", "18" = "CM021956.1",
  "19" = "CM021957.1", "20" = "CM021958.1", "X" = "CM021959.1"
)
# create new R object called gew_genome with only chromosomes 1-20 & X
# this is the new BSgenome object
new_genome <- keepBSgenomeSequences(genome, names(chr_map))
seqnames(new_genome) <- chr_map
# print out
genome
# | BSgenome object for Crab-eating macaque
# | - organism: Macaca fascicularis
# | - provider: NCBI
# | - genome: Macaca_fascicularis_6.0
# | - release date: 2020/03/10
# | - 936 sequence(s):
# |     1                                        2                                        3                                        4
# |     5                                        6                                        7                                        8
# |     9                                        10                                       11                                       12
# |     13                                       14                                       15                                       16
# |     17                                       18                                       19                                       20
# |     ...                                      ...                                      ...                                      ...
# |     tig00001423_obj                          tig00001424_obj                          tig00001425_obj                          tig00001428_obj
# |     tig00001430_obj                          tig00001431_obj                          tig00001432_obj                          tig00001433_obj
# |     tig00001434_obj                          tig00001435_obj                          tig00001436_obj                          tig00001437_obj
# |     tig00001438_obj                          tig00001440_obj                          tig00001441_obj                          tig00001442_obj
# |     tig00001443_obj                          tig00001444_obj                          tig00001445_obj                          tig00001446_obj
# |
# | Tips: call 'seqnames()' on the object to get all the sequence names, call 'seqinfo()' to get the full sequence info, use the '$' or '[[' operator to access a given sequence, see
# | '?BSgenome' for more information.
new_genome
# | BSgenome object for Crab-eating macaque
# | - organism: Macaca fascicularis
# | - provider: NCBI
# | - genome: Macaca_fascicularis_6.0
# | - release date: 2020/03/10
# | - 21 sequence(s):
# |     CM021939.1 CM021940.1 CM021941.1 CM021942.1 CM021943.1 CM021944.1 CM021945.1 CM021946.1 CM021947.1 CM021948.1 CM021949.1 CM021950.1 CM021951.1 CM021952.1 CM021953.1 CM021954.1
# |     CM021955.1 CM021956.1 CM021957.1 CM021958.1 CM021959.1
# |
# | Tips: call 'seqnames()' on the object to get all the sequence names, call 'seqinfo()' to get the full sequence info, use the '$' or '[[' operator to access a given sequence, see
# | '?BSgenome' for more information.
chr_map
#            1            2            3            4            5            6            7            8            9           10           11           12           13           14
# "CM021939.1" "CM021940.1" "CM021941.1" "CM021942.1" "CM021943.1" "CM021944.1" "CM021945.1" "CM021946.1" "CM021947.1" "CM021948.1" "CM021949.1" "CM021950.1" "CM021951.1" "CM021952.1"
#           15           16           17           18           19           20            X
# "CM021953.1" "CM021954.1" "CM021955.1" "CM021956.1" "CM021957.1" "CM021958.1" "CM021959.1"

seqinfo(new_genome)
# Seqinfo object with 21 sequences from an unspecified genome; no seqlengths:
#   seqnames   seqlengths isCircular genome
#   CM021939.1       <NA>       <NA>   <NA>
#   CM021940.1       <NA>       <NA>   <NA>
#   CM021941.1       <NA>       <NA>   <NA>
#   CM021942.1       <NA>       <NA>   <NA>
#   CM021943.1       <NA>       <NA>   <NA>
#   ...               ...        ...    ...
#   CM021955.1       <NA>       <NA>   <NA>
#   CM021956.1       <NA>       <NA>   <NA>
#   CM021957.1       <NA>       <NA>   <NA>
#   CM021958.1       <NA>       <NA>   <NA>
#   CM021959.1       <NA>       <NA>   <NA>
# see 1st chromosomes
head(seqlengths(new_genome))
# CM021939.1 CM021940.1 CM021941.1 CM021942.1 CM021943.1 CM021944.1
#         NA         NA         NA         NA         NA         NA
# see how many base pairs are in chromosome 1
new_genome[["CM021939.1"]]
# Error in if (length(ans) != seqlengths(x)[[user_seqname]]) { :
#   missing value where TRUE/FALSE needed

I tried manually adding the sequence length info in the seqlengths column of the new_genome BS genome object using chromosomes info I retrieved from NCBI for Macaca_fascicularis_6.0 here, but I still wan't able to add tha to the BS genome object.

# Read the TSV file (assuming tab-separated and columns: seqname, length, gc_content or gc_count)
seq_info <- read.table("~/macaque_multiomics/macfas6_ncbi_sequence_report_chromosomes.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)

# Check column names to ensure correct assignment
head(seq_info)


<img width="1351" alt="Image" src="https://github.com/user-attachments/assets/77ed8107-3905-46b0-a5f6-e832c5777fdb" />


# Print column names
print(colnames(seq_info))

 [1] "Assembly.Accession"      "Assembly.unit.accession" "Chromosome.name"         "GC.Count"                "GC.Percent"              "GenBank.seq.accession"  
 [7] "Molecule.type"           "Ordering"                "RefSeq.seq.accession"    "Role"                    "Seq.length"              "UCSC.style.name"        
[13] "Unlocalized.Count"       "Sequence.name" 


# Extract chromosome names from BSgenome object
bs_chroms <- seqnames(new_genome)

# Subset seq_info to only include relevant rows
seq_info_subset <- seq_info[seq_info$GenBank.seq.accession %in% bs_chroms, ]

# Check structure
str(seq_info_subset)
head(seq_info_subset[, c("GenBank.seq.accession", "Seq.length")])

#tried assigning sequence lenghts to new_genome
# Convert column types 
seq_info_subset$GenBank.seq.accession <- as.character(seq_info_subset$GenBank.seq.accession)
seq_info_subset$Seq.length <- as.numeric(seq_info_subset$Seq.length)

# Create a named vector for sequence lengths
seq_lengths <- setNames(seq_info_subset$Seq.length, seq_info_subset$GenBank.seq.accession)

# Assign sequence lengths to BSgenome object
seqlengths(new_genome) <- seq_lengths

# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) : 
#   seqlengths() and isCircular() of the supplied 'seqinfo' must be identical to seqlengths() and isCircular() of the current 'seqinfo' when replacing the 'seqinfo' of
#   a BSgenome object


I kept getting the same error when trying many different ways to add the sequence lengths to the BS genome object. Is it true that once a BSgenome object is created, you cannot directly modify the seqinfo slot, including the seqlengths? I don't know why the original BS genome object BSgenome.Mfascicularis.NCBI.6.0 doesn't have any metadata besides chromosome names even after unloading & reloading object. I've also tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 and the BSgenome R packages but no luck.

Session info

devtools::session_info()

─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.0 (2024-04-24)
 os       Ubuntu 24.04.1 LTS
 system   x86_64, linux-gnu
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       /UTC
 date     2025-02-07
 rstudio  2024.09.0+375 Cranberry Hibiscus (server)
 pandoc   3.2 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/x86_64/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package                         * version    date (UTC) lib source
   abind                             1.4-8      2024-09-12 [2] CRAN (R 4.4.0)
   AnnotationDbi                   * 1.66.0     2024-05-01 [1] Bioconductor 3.19 (R 4.4.0)
   Biobase                         * 2.64.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   BiocFileCache                     2.12.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   BiocGenerics                    * 0.50.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   BiocIO                          * 1.14.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   BiocManager                       1.30.25    2024-08-28 [2] CRAN (R 4.4.0)
   BiocParallel                      1.38.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   biomaRt                           2.60.1     2024-06-26 [1] Bioconductor 3.19 (R 4.4.0)
   Biostrings                      * 2.72.1     2024-06-02 [1] Bioconductor 3.19 (R 4.4.0)
   bit                               4.5.0      2024-09-20 [2] CRAN (R 4.4.0)
   bit64                             4.5.2      2024-09-22 [2] CRAN (R 4.4.0)
   bitops                            1.0-9      2024-10-03 [2] CRAN (R 4.4.0)
   blob                              1.2.4      2023-03-17 [2] CRAN (R 4.4.0)
   BSgenome                        * 1.72.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 R BSgenome.Mfascicularis.NCBI.5.0 * 1.4.2      <NA>       [1] <NA>
   BSgenome.Mfascicularis.NCBI.6.0 * 1.5.0      2025-02-07 [1] Bioconductor
   cachem                            1.1.0      2024-05-16 [2] CRAN (R 4.4.0)
   cli                               3.6.3      2024-06-21 [2] CRAN (R 4.4.0)
   cluster                           2.1.6      2023-12-01 [5] CRAN (R 4.4.0)
   codetools                         0.2-20     2024-03-31 [5] CRAN (R 4.4.0)
   colorspace                        2.1-1      2024-07-26 [2] CRAN (R 4.4.0)
   cowplot                           1.1.3      2024-01-22 [2] CRAN (R 4.4.0)
   crayon                            1.5.3      2024-06-20 [2] CRAN (R 4.4.0)
   curl                              5.2.3      2024-09-20 [2] CRAN (R 4.4.0)
   data.table                        1.16.0     2024-08-27 [2] CRAN (R 4.4.0)
   DBI                               1.2.3      2024-06-02 [2] CRAN (R 4.4.0)
   dbplyr                            2.5.0      2024-03-19 [2] CRAN (R 4.4.0)
   DelayedArray                      0.30.1     2024-05-07 [1] Bioconductor 3.19 (R 4.4.0)
   deldir                            2.0-4      2024-02-28 [2] CRAN (R 4.4.0)
   devtools                        * 2.4.5      2022-10-11 [2] CRAN (R 4.4.0)
   digest                            0.6.37     2024-08-19 [2] CRAN (R 4.4.0)
   doParallel                      * 1.0.17     2022-02-07 [1] CRAN (R 4.4.0)
   dotCall64                         1.2        2024-10-04 [2] CRAN (R 4.4.0)
   DoubletFinder                   * 2.0.4      2024-12-05 [1] Github (chris-mcginnis-ucsf/DoubletFinder@03e9f37)
   dplyr                           * 1.1.4      2023-11-17 [2] CRAN (R 4.4.0)
   DropletQC                       * 0.0.0.9000 2024-12-01 [1] Github (powellgenomicslab/DropletQC@5d7dadc)
   ellipsis                          0.3.2      2021-04-29 [2] CRAN (R 4.4.0)
   evaluate                          1.0.0      2024-09-17 [2] CRAN (R 4.4.0)
   fansi                             1.0.6      2023-12-08 [2] CRAN (R 4.4.0)
   farver                            2.1.2      2024-05-13 [2] CRAN (R 4.4.0)
   fastDummies                       1.7.4      2024-08-16 [2] CRAN (R 4.4.0)
   fastmap                           1.2.0      2024-05-15 [2] CRAN (R 4.4.0)
 V fastmatch                         1.1-4      2024-12-23 [1] CRAN (R 4.4.0) (on disk 1.1.6)
   fields                          * 16.3       2024-09-30 [1] CRAN (R 4.4.0)
   filelock                          1.0.3      2023-12-11 [1] CRAN (R 4.4.0)
   fitdistrplus                      1.2-1      2024-07-12 [2] CRAN (R 4.4.0)
   flexmix                         * 2.3-19     2023-03-16 [2] CRAN (R 4.4.0)
   forcats                         * 1.0.0      2023-01-29 [2] CRAN (R 4.4.0)
   foreach                         * 1.5.2      2022-02-02 [2] CRAN (R 4.4.0)
   fs                                1.6.4      2024-04-25 [2] CRAN (R 4.4.0)
   future                          * 1.34.0     2024-07-29 [2] CRAN (R 4.4.0)
   future.apply                      1.11.2     2024-03-28 [2] CRAN (R 4.4.0)
   generics                          0.1.3      2022-07-05 [2] CRAN (R 4.4.0)
   GenomeInfoDb                    * 1.40.1     2024-05-24 [1] Bioconductor 3.19 (R 4.4.0)
   GenomeInfoDbData                  1.2.12     2024-10-01 [1] Bioconductor
   GenomicAlignments                 1.40.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   GenomicFeatures                 * 1.56.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   GenomicRanges                   * 1.56.2     2024-10-09 [1] Bioconductor 3.19 (R 4.4.0)
   ggplot2                         * 3.5.1      2024-04-23 [2] CRAN (R 4.4.0)
   ggrepel                           0.9.6      2024-09-07 [2] CRAN (R 4.4.0)
   ggridges                          0.5.6      2024-01-23 [2] CRAN (R 4.4.0)
   globals                           0.16.3     2024-03-08 [2] CRAN (R 4.4.0)
   glue                              1.8.0      2024-09-30 [2] CRAN (R 4.4.0)
   goftest                           1.2-3      2021-10-07 [2] CRAN (R 4.4.0)
   gridExtra                         2.3        2017-09-09 [2] CRAN (R 4.4.0)
   gtable                            0.3.5      2024-04-22 [2] CRAN (R 4.4.0)
   hdf5r                           * 1.3.11     2024-07-07 [2] CRAN (R 4.4.0)
   hms                               1.1.3      2023-03-21 [2] CRAN (R 4.4.0)
   htmltools                         0.5.8.1    2024-04-04 [2] CRAN (R 4.4.0)
   htmlwidgets                       1.6.4      2023-12-06 [2] CRAN (R 4.4.0)
   httpuv                            1.6.15     2024-03-26 [2] CRAN (R 4.4.0)
   httr                              1.4.7      2023-08-15 [2] CRAN (R 4.4.0)
   httr2                             1.0.5      2024-09-26 [2] CRAN (R 4.4.0)
   ica                               1.0-3      2022-07-08 [2] CRAN (R 4.4.0)
   igraph                            2.0.3      2024-03-13 [2] CRAN (R 4.4.0)
   IRanges                         * 2.38.1     2024-07-03 [1] Bioconductor 3.19 (R 4.4.0)
   irlba                             2.3.5.1    2022-10-03 [2] CRAN (R 4.4.0)
   iterators                       * 1.0.14     2022-02-05 [2] CRAN (R 4.4.0)
   jsonlite                          1.8.9      2024-09-20 [2] CRAN (R 4.4.0)
   KEGGREST                          1.44.1     2024-06-19 [1] Bioconductor 3.19 (R 4.4.0)
   KernSmooth                      * 2.23-24    2024-05-17 [5] CRAN (R 4.4.0)
   knitr                             1.48       2024-07-07 [2] CRAN (R 4.4.0)
   later                             1.3.2      2023-12-06 [2] CRAN (R 4.4.0)
   lattice                         * 0.22-5     2023-10-24 [5] CRAN (R 4.3.3)
   lazyeval                          0.2.2      2019-03-15 [2] CRAN (R 4.4.0)
   leiden                            0.4.3.1    2023-11-17 [2] CRAN (R 4.4.0)
   lifecycle                         1.0.4      2023-11-07 [2] CRAN (R 4.4.0)
   listenv                           0.9.1      2024-01-29 [2] CRAN (R 4.4.0)
   lmtest                            0.9-40     2022-03-21 [2] CRAN (R 4.4.0)
   lubridate                       * 1.9.3      2023-09-27 [2] CRAN (R 4.4.0)
   magrittr                          2.0.3      2022-03-30 [2] CRAN (R 4.4.0)
   maps                              3.4.2.1    2024-11-10 [1] CRAN (R 4.4.0)
   MASS                              7.3-61     2024-06-13 [2] CRAN (R 4.4.0)
   Matrix                            1.7-0      2024-04-26 [5] CRAN (R 4.4.0)
   MatrixGenerics                    1.16.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   matrixStats                       1.4.1      2024-09-08 [2] CRAN (R 4.4.0)
   mclust                            6.1.1      2024-04-29 [1] CRAN (R 4.4.0)
   memoise                           2.0.1      2021-11-26 [2] CRAN (R 4.4.0)
   mime                              0.12       2021-09-28 [2] CRAN (R 4.4.0)
   miniUI                            0.1.1.1    2018-05-18 [2] CRAN (R 4.4.0)
   miQC                            * 1.12.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   modeltools                        0.2-23     2020-03-05 [2] CRAN (R 4.4.0)
   munsell                           0.5.1      2024-04-01 [2] CRAN (R 4.4.0)
   nlme                              3.1-166    2024-08-14 [2] CRAN (R 4.4.0)
   nnet                              7.3-19     2023-05-03 [5] CRAN (R 4.3.3)
   parallelly                        1.38.0     2024-07-27 [2] CRAN (R 4.4.0)
   patchwork                       * 1.3.0      2024-09-16 [2] CRAN (R 4.4.0)
   pbapply                           1.7-2      2023-06-27 [2] CRAN (R 4.4.0)
   pillar                            1.9.0      2023-03-22 [2] CRAN (R 4.4.0)
   pkgbuild                          1.4.4      2024-03-17 [2] CRAN (R 4.4.0)
   pkgconfig                         2.0.3      2019-09-22 [2] CRAN (R 4.4.0)
   pkgload                           1.4.0      2024-06-28 [2] CRAN (R 4.4.0)
   plotly                            4.10.4     2024-01-13 [2] CRAN (R 4.4.0)
   plyr                              1.8.9      2023-10-02 [2] CRAN (R 4.4.0)
   png                               0.1-8      2022-11-29 [2] CRAN (R 4.4.0)
   polyclip                          1.10-7     2024-07-23 [2] CRAN (R 4.4.0)
   prettyunits                       1.2.0      2023-09-24 [2] CRAN (R 4.4.0)
   profvis                           0.4.0      2024-09-20 [2] CRAN (R 4.4.0)
   progress                          1.2.3      2023-12-06 [2] CRAN (R 4.4.0)
   progressr                         0.14.0     2023-08-10 [2] CRAN (R 4.4.0)
   promises                          1.3.0      2024-04-05 [2] CRAN (R 4.4.0)
   purrr                           * 1.0.2      2023-08-10 [2] CRAN (R 4.4.0)
   R.methodsS3                     * 1.8.2      2022-06-13 [1] CRAN (R 4.4.0)
   R.oo                            * 1.27.0     2024-11-01 [1] CRAN (R 4.4.0)
   R.utils                         * 2.12.3     2023-11-18 [1] CRAN (R 4.4.0)
   R6                                2.5.1      2021-08-19 [2] CRAN (R 4.4.0)
   RANN                              2.6.2      2024-08-25 [2] CRAN (R 4.4.0)
   rappdirs                          0.3.3      2021-01-31 [2] CRAN (R 4.4.0)
   RColorBrewer                      1.1-3      2022-04-03 [2] CRAN (R 4.4.0)
   Rcpp                              1.0.13     2024-07-17 [2] CRAN (R 4.4.0)
   RcppAnnoy                         0.0.22     2024-01-23 [2] CRAN (R 4.4.0)
   RcppHNSW                          0.6.0      2024-02-04 [2] CRAN (R 4.4.0)
   RcppRoll                          0.3.1      2024-07-07 [1] CRAN (R 4.4.0)
   RCurl                             1.98-1.16  2024-07-11 [1] CRAN (R 4.4.0)
   readr                           * 2.1.5      2024-01-10 [2] CRAN (R 4.4.0)
   remotes                         * 2.5.0      2024-03-17 [2] CRAN (R 4.4.0)
   reshape2                          1.4.4      2020-04-09 [2] CRAN (R 4.4.0)
   restfulr                          0.0.15     2022-06-16 [1] CRAN (R 4.4.0)
   reticulate                        1.39.0     2024-09-05 [2] CRAN (R 4.4.0)
   rjson                             0.2.23     2024-09-16 [1] CRAN (R 4.4.0)
   rlang                             1.1.4      2024-06-04 [2] CRAN (R 4.4.0)
   rmarkdown                         2.28       2024-08-17 [2] CRAN (R 4.4.0)
   ROCR                            * 1.0-11     2020-05-02 [2] CRAN (R 4.4.0)
   Rsamtools                         2.20.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   RSpectra                          0.16-2     2024-07-18 [2] CRAN (R 4.4.0)
   RSQLite                           2.3.8      2024-11-17 [1] CRAN (R 4.4.0)
   rstudioapi                        0.16.0     2024-03-24 [2] CRAN (R 4.4.0)
   rsvd                              1.0.5      2021-04-16 [1] CRAN (R 4.4.0)
   rtracklayer                     * 1.64.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   Rtsne                             0.17       2023-12-07 [2] CRAN (R 4.4.0)
   S4Arrays                          1.4.1      2024-05-20 [1] Bioconductor 3.19 (R 4.4.0)
   S4Vectors                       * 0.42.1     2024-07-03 [1] Bioconductor 3.19 (R 4.4.0)
   scales                            1.3.0      2023-11-28 [2] CRAN (R 4.4.0)
   scattermore                       1.2        2023-06-12 [2] CRAN (R 4.4.0)
   sctransform                       0.4.1      2023-10-19 [2] CRAN (R 4.4.0)
   sessioninfo                       1.2.2      2021-12-06 [2] CRAN (R 4.4.0)
   Seurat                          * 5.1.0      2024-05-10 [2] CRAN (R 4.4.0)
   SeuratDisk                      * 0.0.0.9021 2024-10-07 [1] Github (mojaveazure/seurat-disk@877d4e1)
   SeuratObject                    * 5.0.2      2024-05-08 [2] CRAN (R 4.4.0)
   SeuratWrappers                    0.4.0      2024-12-05 [1] Github (satijalab/seurat-wrappers@a1eb0d8)
   shiny                             1.9.1      2024-08-01 [2] CRAN (R 4.4.0)
   Signac                          * 1.14.9002  2025-01-17 [1] Github (stuart-lab/signac@39df167)
   SingleCellExperiment              1.26.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   SoupX                           * 1.6.2      2022-11-01 [2] CRAN (R 4.4.0)
   sp                              * 2.1-4      2024-04-30 [2] CRAN (R 4.4.0)
   spam                            * 2.11-0     2024-10-03 [2] CRAN (R 4.4.0)
   SparseArray                       1.4.8      2024-05-24 [1] Bioconductor 3.19 (R 4.4.0)
   spatstat.data                     3.1-2      2024-06-21 [2] CRAN (R 4.4.0)
   spatstat.explore                  3.3-2      2024-08-21 [2] CRAN (R 4.4.0)
   spatstat.geom                     3.3-3      2024-09-18 [2] CRAN (R 4.4.0)
   spatstat.random                   3.3-2      2024-09-18 [2] CRAN (R 4.4.0)
   spatstat.sparse                   3.1-0      2024-06-21 [2] CRAN (R 4.4.0)
   spatstat.univar                   3.0-1      2024-09-05 [2] CRAN (R 4.4.0)
   spatstat.utils                    3.1-0      2024-08-17 [2] CRAN (R 4.4.0)
   stringi                           1.8.4      2024-05-06 [2] CRAN (R 4.4.0)
   stringr                         * 1.5.1      2023-11-14 [2] CRAN (R 4.4.0)
   SummarizedExperiment              1.34.0     2024-05-01 [1] Bioconductor 3.19 (R 4.4.0)
   survival                          3.7-0      2024-06-05 [2] CRAN (R 4.4.0)
   tensor                            1.5        2012-05-05 [2] CRAN (R 4.4.0)
   tibble                          * 3.2.1      2023-03-20 [2] CRAN (R 4.4.0)
   tidyr                           * 1.3.1      2024-01-24 [2] CRAN (R 4.4.0)
   tidyselect                        1.2.1      2024-03-11 [2] CRAN (R 4.4.0)
   tidyverse                       * 2.0.0      2023-02-22 [2] CRAN (R 4.4.0)
   timechange                        0.3.0      2024-01-18 [2] CRAN (R 4.4.0)
   txdbmaker                       * 1.0.1      2024-06-23 [1] Bioconductor 3.19 (R 4.4.0)
   tzdb                              0.4.0      2023-05-12 [2] CRAN (R 4.4.0)
   UCSC.utils                        1.0.0      2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   urlchecker                        1.0.1      2021-11-30 [2] CRAN (R 4.4.0)
   usethis                         * 3.0.0      2024-07-29 [2] CRAN (R 4.4.0)
   utf8                              1.2.4      2023-10-22 [2] CRAN (R 4.4.0)
   uwot                              0.2.2      2024-04-21 [2] CRAN (R 4.4.0)
   vctrs                             0.6.5      2023-12-01 [2] CRAN (R 4.4.0)
   viridisLite                     * 0.4.2      2023-05-02 [2] CRAN (R 4.4.0)
   withr                             3.0.1      2024-07-31 [2] CRAN (R 4.4.0)
   xfun                              0.48       2024-10-03 [2] CRAN (R 4.4.0)
 V XML                               3.99-0.17  2025-01-01 [1] CRAN (R 4.4.0) (on disk 3.99.0.18)
   xml2                              1.3.6      2023-12-04 [2] CRAN (R 4.4.0)
   xtable                            1.8-4      2019-04-21 [2] CRAN (R 4.4.0)
   XVector                         * 0.44.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   yaml                              2.3.10     2024-07-26 [2] CRAN (R 4.4.0)
   zlibbioc                          1.50.0     2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
   zoo                               1.8-12     2023-04-13 [2] CRAN (R 4.4.0)

 [1] ~/R/x86_64-pc-linux-gnu-library/4.4
 [2] /library
 [3] /usr/local/lib/R/site-library
 [4] /usr/lib/R/site-library
 [5] /usr/lib/R/library

 V ── Loaded and on-disk version mismatch.
 R ── Package was removed from disk.




@zgb963 zgb963 changed the title Unable to restore chromosome sequence lengths in BSgenome.Mfascicularis.NCBI.6.0 Unable to restore chromosome sequence lengths or other metadata in BSgenome.Mfascicularis.NCBI.6.0 Feb 14, 2025
@hpages
Copy link
Contributor

hpages commented Feb 27, 2025

Hi,

All the changes one can make to the seqinfo of a BSgenome package happen in memory and NEVER actually touch the package installation folder on disk. This is true for all package installation folders: they are always treated as read-only folders. Imagine the chaos on a system where packages are shared across users if users could actually alter the content of the installed packages! So no need to reinstall anything. Just quit R and restart a fresh session.

Also when you are about to quit R, make sure to NEVER answer "yes" when asked if you want to "Save workspace image". Otherwise, when you start the next session, R will automatically reload all the objects that you had in your previous session (R will tell you that by displaying [Previously saved workspace restored] on startup), including your broken BSgenome objects. So you won't be in a fresh session. Note that the workspace is saved in a file called .RData so make sure to delete that file if you inadvertently saved your workspace.

Finally, replacing the chromosome names with the GenBank accessions can simply be done with:

library(BSgenome.Mfascicularis.NCBI.6.0)
bsg <- BSgenome.Mfascicularis.NCBI.6.0
chrominfo <- getChromInfoFromNCBI("Macaca_fascicularis_6.0")
stopifnot(identical(seqlevels(bsg), chrominfo$SequenceName))
seqlevels(bsg) <- chrominfo$GenBankAccn
seqinfo(bsg)
# Seqinfo object with 936 sequences from Macaca_fascicularis_6.0 genome:
#   seqnames          seqlengths isCircular                  genome
#   CM021939.1         223606306      FALSE Macaca_fascicularis_6.0
#   CM021940.1         194592313      FALSE Macaca_fascicularis_6.0
#   CM021941.1         186444865      FALSE Macaca_fascicularis_6.0
#   CM021942.1         171057148      FALSE Macaca_fascicularis_6.0
#   CM021943.1         186553353      FALSE Macaca_fascicularis_6.0
#   ...                      ...        ...                     ...
#   JAANEP010001492.1      51793      FALSE Macaca_fascicularis_6.0
#   JAANEP010001493.1      52876      FALSE Macaca_fascicularis_6.0
#   JAANEP010001494.1      10272      FALSE Macaca_fascicularis_6.0
#   JAANEP010001495.1       3359      FALSE Macaca_fascicularis_6.0
#   JAANEP010001496.1       2300      FALSE Macaca_fascicularis_6.0

Hope this helps,
H.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants