You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using BSgenome.Mfascicularis.NCBI.6.0 genome for the Signac multiomics tutorial. It was working fine until the I was trying to change the name of the chromosomes to match the GTF I have (macFas6-hg38-gencode.v47.basic.sortedgtf).
This code worked before
mf_genome <- BSgenome.Mfascicularis.NCBI.6.0
# see 1st chromosomes
head(seqlengths(mf_genome))
# see how many base pairs are in chromosome 1
mf_genome[["1"]]
# chromosome 1 had 223606306 base pairs.
# if you look at the chromsosomes table on bottom of page, chromsome 1 is 'CM021939.1' and it has 223606306 base pairs (no longer working???)
# also lists the gc content % for each chromsome
But then when I tried to change the chromosome names...
# # Store mapping in R as vector
# chr_map <- c(
# "1" = "CM021939.1", "2" = "CM021940.1", "3" = "CM021941.1",
# "4" = "CM021942.1", "5" = "CM021943.1", "6" = "CM021944.1",
# "7" = "CM021945.1", "8" = "CM021946.1", "9" = "CM021947.1",
# "10" = "CM021948.1", "11" = "CM021949.1", "12" = "CM021950.1",
# "13" = "CM021951.1", "14" = "CM021952.1", "15" = "CM021953.1",
# "16" = "CM021954.1", "17" = "CM021955.1", "18" = "CM021956.1",
# "19" = "CM021957.1", "20" = "CM021958.1", "X" = "CM021959.1"
# )
#
#
# # Get the original chromosome names
# original_chr_names <- seqnames(BSgenome.Mfascicularis.NCBI.6.0)
# original_chr_names
# # [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "X"
#
#
#
# # Check which names need to be changed (need to remove NA's from here --- rest of the scaffolds that aren't chromosome 1-20 or X)
# new_chr_names <- chr_map[original_chr_names]
# new_chr_names
#
# # 1 2 3 4 5 6 7 8 9 10 11 12 13
# # "CM021939.1" "CM021940.1" "CM021941.1" "CM021942.1" "CM021943.1" "CM021944.1" "CM021945.1" "CM021946.1" "CM021947.1" "CM021948.1" "CM021949.1" "CM021950.1" "CM021951.1"
# # 14 15 16 17 18 19 20 X
# # "CM021952.1" "CM021953.1" "CM021954.1" "CM021955.1" "CM021956.1" "CM021957.1" "CM021958.1" "CM021959.1"
#
#
#
#
# # Keep only non-NA values ( chrom 1-20 & X)
# # new_chr_names <- new_chr_names[!is.na(new_chr_names)]
#
#
# # # Remove NA values to ensure lengths match
# # valid_indices <- !is.na(new_chr_names)
# # original_chr_names <- original_chr_names[valid_indices]
# # new_chr_names <- new_chr_names[valid_indices]
# #
# # # Confirm lengths match
# # length(original_chr_names) == length(new_chr_names)
#
#
# # Assign the new chromosome names
# names(BSgenome.Mfascicularis.NCBI.6.0@seqinfo) <- new_chr_names
#
# # Error in `seqnames<-`(x, value) :
# # length of supplied 'seqnames' vector must equal the number of sequences
#
#
#
#
# # Verify the update
# seqnames(BSgenome.Mfascicularis.NCBI.6.0)
1 2 3 4
NA NA NA NA
5 6 7 8
NA NA NA NA
9 10 11 12
NA NA NA NA
13 14 15 16
NA NA NA NA
17 18 19 20
NA NA NA NA
X Super-Scaffold_100058 Super-Scaffold_100060 Super-Scaffold_100064
NA NA NA NA
Super-Scaffold_100065 Super-Scaffold_100066 Super-Scaffold_100068 Super-Scaffold_100069
NA NA NA NA
Super-Scaffold_100072 Super-Scaffold_100073 Super-Scaffold_100078 Super-Scaffold_100080
NA NA NA NA
It removed all the other metadata. I've tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 & BSgenome but it doesn't recover the seqlengths or the other metadata that was originally there
I tried this too linked here. The metadata that was originally in BSgenome.Mfascicularis.NCBI.6.0 is no longer there and I can't restore it? I succesfully changed the chromosome names I needed and created a subsetted BS genome object called new_genome but again I no longer have seqlength info, etc
# hack from https://support.bioconductor.org/p/83588/ to keep only some chromosomes within a BSgenome object.
keepBSgenomeSequences <- function(genome, seqnames)
{
stopifnot(all(seqnames %in% seqnames(genome)))
genome@user_seqnames <- setNames(seqnames, seqnames)
genome@seqinfo <- genome@seqinfo[seqnames]
genome
}
# load macfas6 NCBI BSgenome object
library(BSgenome.Mfascicularis.NCBI.6.0)
# assign to R object called genome
genome <- BSgenome.Mfascicularis.NCBI.6.0
# check if seqlengths info is in original genome
head(seqlengths(genome))
# 1 2 3 4 5 6
# NA NA NA NA NA NA
# see how many base pairs are in chromosome 1
genome[["1"]]
# Error in if (length(ans) != seqlengths(x)[[user_seqname]]) { :
# missing value where TRUE/FALSE needed
# # Store mapping in R as vector
chr_map <- c(
"1" = "CM021939.1", "2" = "CM021940.1", "3" = "CM021941.1",
"4" = "CM021942.1", "5" = "CM021943.1", "6" = "CM021944.1",
"7" = "CM021945.1", "8" = "CM021946.1", "9" = "CM021947.1",
"10" = "CM021948.1", "11" = "CM021949.1", "12" = "CM021950.1",
"13" = "CM021951.1", "14" = "CM021952.1", "15" = "CM021953.1",
"16" = "CM021954.1", "17" = "CM021955.1", "18" = "CM021956.1",
"19" = "CM021957.1", "20" = "CM021958.1", "X" = "CM021959.1"
)
# create new R object called gew_genome with only chromosomes 1-20 & X
# this is the new BSgenome object
new_genome <- keepBSgenomeSequences(genome, names(chr_map))
seqnames(new_genome) <- chr_map
# print out
genome
# | BSgenome object for Crab-eating macaque
# | - organism: Macaca fascicularis
# | - provider: NCBI
# | - genome: Macaca_fascicularis_6.0
# | - release date: 2020/03/10
# | - 936 sequence(s):
# | 1 2 3 4
# | 5 6 7 8
# | 9 10 11 12
# | 13 14 15 16
# | 17 18 19 20
# | ... ... ... ...
# | tig00001423_obj tig00001424_obj tig00001425_obj tig00001428_obj
# | tig00001430_obj tig00001431_obj tig00001432_obj tig00001433_obj
# | tig00001434_obj tig00001435_obj tig00001436_obj tig00001437_obj
# | tig00001438_obj tig00001440_obj tig00001441_obj tig00001442_obj
# | tig00001443_obj tig00001444_obj tig00001445_obj tig00001446_obj
# |
# | Tips: call 'seqnames()' on the object to get all the sequence names, call 'seqinfo()' to get the full sequence info, use the '$' or '[[' operator to access a given sequence, see
# | '?BSgenome' for more information.
new_genome
# | BSgenome object for Crab-eating macaque
# | - organism: Macaca fascicularis
# | - provider: NCBI
# | - genome: Macaca_fascicularis_6.0
# | - release date: 2020/03/10
# | - 21 sequence(s):
# | CM021939.1 CM021940.1 CM021941.1 CM021942.1 CM021943.1 CM021944.1 CM021945.1 CM021946.1 CM021947.1 CM021948.1 CM021949.1 CM021950.1 CM021951.1 CM021952.1 CM021953.1 CM021954.1
# | CM021955.1 CM021956.1 CM021957.1 CM021958.1 CM021959.1
# |
# | Tips: call 'seqnames()' on the object to get all the sequence names, call 'seqinfo()' to get the full sequence info, use the '$' or '[[' operator to access a given sequence, see
# | '?BSgenome' for more information.
chr_map
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14
# "CM021939.1" "CM021940.1" "CM021941.1" "CM021942.1" "CM021943.1" "CM021944.1" "CM021945.1" "CM021946.1" "CM021947.1" "CM021948.1" "CM021949.1" "CM021950.1" "CM021951.1" "CM021952.1"
# 15 16 17 18 19 20 X
# "CM021953.1" "CM021954.1" "CM021955.1" "CM021956.1" "CM021957.1" "CM021958.1" "CM021959.1"
seqinfo(new_genome)
# Seqinfo object with 21 sequences from an unspecified genome; no seqlengths:
# seqnames seqlengths isCircular genome
# CM021939.1 <NA> <NA> <NA>
# CM021940.1 <NA> <NA> <NA>
# CM021941.1 <NA> <NA> <NA>
# CM021942.1 <NA> <NA> <NA>
# CM021943.1 <NA> <NA> <NA>
# ... ... ... ...
# CM021955.1 <NA> <NA> <NA>
# CM021956.1 <NA> <NA> <NA>
# CM021957.1 <NA> <NA> <NA>
# CM021958.1 <NA> <NA> <NA>
# CM021959.1 <NA> <NA> <NA>
# see 1st chromosomes
head(seqlengths(new_genome))
# CM021939.1 CM021940.1 CM021941.1 CM021942.1 CM021943.1 CM021944.1
# NA NA NA NA NA NA
# see how many base pairs are in chromosome 1
new_genome[["CM021939.1"]]
# Error in if (length(ans) != seqlengths(x)[[user_seqname]]) { :
# missing value where TRUE/FALSE needed
I tried manually adding the sequence length info in the seqlengths column of the new_genome BS genome object using chromosomes info I retrieved from NCBI for Macaca_fascicularis_6.0 here, but I still wan't able to add tha to the BS genome object.
# Read the TSV file (assuming tab-separated and columns: seqname, length, gc_content or gc_count)
seq_info <- read.table("~/macaque_multiomics/macfas6_ncbi_sequence_report_chromosomes.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# Check column names to ensure correct assignment
head(seq_info)
<img width="1351" alt="Image" src="https://github.com/user-attachments/assets/77ed8107-3905-46b0-a5f6-e832c5777fdb" />
# Print column names
print(colnames(seq_info))
[1] "Assembly.Accession" "Assembly.unit.accession" "Chromosome.name" "GC.Count" "GC.Percent" "GenBank.seq.accession"
[7] "Molecule.type" "Ordering" "RefSeq.seq.accession" "Role" "Seq.length" "UCSC.style.name"
[13] "Unlocalized.Count" "Sequence.name"
# Extract chromosome names from BSgenome object
bs_chroms <- seqnames(new_genome)
# Subset seq_info to only include relevant rows
seq_info_subset <- seq_info[seq_info$GenBank.seq.accession %in% bs_chroms, ]
# Check structure
str(seq_info_subset)
head(seq_info_subset[, c("GenBank.seq.accession", "Seq.length")])
#tried assigning sequence lenghts to new_genome
# Convert column types
seq_info_subset$GenBank.seq.accession <- as.character(seq_info_subset$GenBank.seq.accession)
seq_info_subset$Seq.length <- as.numeric(seq_info_subset$Seq.length)
# Create a named vector for sequence lengths
seq_lengths <- setNames(seq_info_subset$Seq.length, seq_info_subset$GenBank.seq.accession)
# Assign sequence lengths to BSgenome object
seqlengths(new_genome) <- seq_lengths
# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) :
# seqlengths() and isCircular() of the supplied 'seqinfo' must be identical to seqlengths() and isCircular() of the current 'seqinfo' when replacing the 'seqinfo' of
# a BSgenome object
I kept getting the same error when trying many different ways to add the sequence lengths to the BS genome object. Is it true that once a BSgenome object is created, you cannot directly modify the seqinfo slot, including the seqlengths? I don't know why the original BS genome object BSgenome.Mfascicularis.NCBI.6.0 doesn't have any metadata besides chromosome names even after unloading & reloading object. I've also tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 and the BSgenome R packages but no luck.
The text was updated successfully, but these errors were encountered:
zgb963
changed the title
Unable to restore chromosome sequence lengths in BSgenome.Mfascicularis.NCBI.6.0
Unable to restore chromosome sequence lengths or other metadata in BSgenome.Mfascicularis.NCBI.6.0
Feb 14, 2025
All the changes one can make to the seqinfo of a BSgenome package happen in memory and NEVER actually touch the package installation folder on disk. This is true for all package installation folders: they are always treated as read-only folders. Imagine the chaos on a system where packages are shared across users if users could actually alter the content of the installed packages! So no need to reinstall anything. Just quit R and restart a fresh session.
Also when you are about to quit R, make sure to NEVER answer "yes" when asked if you want to "Save workspace image". Otherwise, when you start the next session, R will automatically reload all the objects that you had in your previous session (R will tell you that by displaying [Previously saved workspace restored] on startup), including your broken BSgenome objects. So you won't be in a fresh session. Note that the workspace is saved in a file called .RData so make sure to delete that file if you inadvertently saved your workspace.
Finally, replacing the chromosome names with the GenBank accessions can simply be done with:
Hello,
I was using BSgenome.Mfascicularis.NCBI.6.0 genome for the Signac multiomics tutorial. It was working fine until the I was trying to change the name of the chromosomes to match the GTF I have (macFas6-hg38-gencode.v47.basic.sortedgtf).
This code worked before
But then when I tried to change the chromosome names...
It removed all the other metadata. I've tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 & BSgenome but it doesn't recover the seqlengths or the other metadata that was originally there
I tried this too linked here. The metadata that was originally in BSgenome.Mfascicularis.NCBI.6.0 is no longer there and I can't restore it? I succesfully changed the chromosome names I needed and created a subsetted BS genome object called
new_genome
but again I no longer have seqlength info, etcI tried manually adding the sequence length info in the seqlengths column of the
new_genome
BS genome object using chromosomes info I retrieved from NCBI for Macaca_fascicularis_6.0 here, but I still wan't able to add tha to the BS genome object.I kept getting the same error when trying many different ways to add the sequence lengths to the BS genome object. Is it true that once a BSgenome object is created, you cannot directly modify the seqinfo slot, including the seqlengths? I don't know why the original BS genome object BSgenome.Mfascicularis.NCBI.6.0 doesn't have any metadata besides chromosome names even after unloading & reloading object. I've also tried uninstalling and reinstalling BSgenome.Mfascicularis.NCBI.6.0 and the BSgenome R packages but no luck.
Session info
The text was updated successfully, but these errors were encountered: