Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent getSeq and isCircular behavior #17

Open
Roleren opened this issue Mar 26, 2021 · 2 comments
Open

inconsistent getSeq and isCircular behavior #17

Roleren opened this issue Mar 26, 2021 · 2 comments
Assignees

Comments

@Roleren
Copy link

Roleren commented Mar 26, 2021

Hey,

Referencing discussion on bioconductor support page

So to conclude what is missing are 2 things:

  1. Make sure isCircular works properly for DNAStringSet, as now the isCircular flag is only updated in the metadata and not shown correctly when using isCircular(DNAStringSet) getter.
  2. Make sure getSeq for BSGenome and DNAStringSet both support circular wrapping ranges (that go from end of sequence wrapping to the beginning or reverse)

If you want some help on this I can make a PR, but there is a lot of package dependencies her, with Biostrings, BSgenome, rtracklayer and GenomeInfoDb.

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Mar 27, 2021

Hi Hervé @hpages, did you want to look at this? Thank you!
Update: Thanks for looking at this Hervé. From your response, it looks like a complicated interaction between several packages. We should layout how we can proceed with making these methods work in harmony.

@LiNk-NY LiNk-NY assigned LiNk-NY and hpages and unassigned LiNk-NY Mar 27, 2021
@Roleren
Copy link
Author

Roleren commented Apr 6, 2021

So first easy thing we could fix is to make the seqinfo getter to actually use the seqinfo table of the DNAStringSet object, you agree ?

This is easily done by updating the seqinfo(DNAStringSet), to not remake the object, but instead use:
metadata(x), if it is defined:

 library(GenomicRanges)
 library(BSgenome.Hsapiens.UCSC.hg19) # Use 38 if you have that instead
 gr <- GRanges("chrMT:16567-16578:+")
 isCircular(gr) <- TRUE
 seqlengths(gr) <- 16569
 # Load sequence
 fa.loaded <- DNAStringSet(BSgenome.Hsapiens.UCSC.hg19$chrMT)

metadata(fa.loaded)
  list() # Output is now empty list

So logic will be, check if metadata() does not return empty list, if it does recreate seqinfo object as before, if it is defined as a valid Seqinfo object, return that directly.

You want me to make a PR for this ? Where should I add tests if needed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants