Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about plot interpretation; large putative inversion #198

Open
jphruska opened this issue May 31, 2024 · 3 comments
Open

Question about plot interpretation; large putative inversion #198

jphruska opened this issue May 31, 2024 · 3 comments

Comments

@jphruska
Copy link

Hello --

Thank you for developing and maintaining a powerful resource for the genomics community.

I've genotyped SVs for a species of bird, and noticed there was an aberrantly large inversion that I suspected was a false positive. To check, I produced a samplot image for individuals of the three genotypes (0/0,0/1,1/1). The genotypes of each individual are indicated.

I have a few questions regarding interpretation. First, for inversions, is there a diagnostic for discriminating between the three genotypes (such as is done with differences in read coverage for deletions)? For example, would the number of discordant paired-end reads spanning the inversion be suggestive of different genotypic states?

Secondly, it appears the main signal of an inversion here is a single pair of discordant reads? Am I interpreting this correctly? If so, there appears to be a single pair of reads that is consistently mapping to the same locations on the reference genome, regardless of the called genotype? There also appears to be a second pair of discordant reads for COL_52524, but that doesn't seem to be in support of an inversion?

Any suggestions on how to best interpret these results would be greatly appreciated.

Thanks
Jack
4_14676614_57494906

@jbelyeu
Copy link
Collaborator

jbelyeu commented Jun 28, 2024

The signal you're seeing here of blue discordant pairs indicating an inversion is a lot of pairs with about the same placement. In samplot there's no great way to differentiate these, but you can get an idea that there are several just because the blue is pretty dark. There are also faint dotted lines indicating chimeric alignments (in addition to the discordant pairs. So, not a single discordant pair, but it's not super easy to tell how many there are aside from "several". This is related to the question of genotype - genotyping inversions isn't super easy and samplot doesn't really try to do it. If you extract the split alignments and pairs that span this breakpoint you could come up with a count that might be useful for estimating genotype, but it's not as simple as the rules of thumb that work for copy number variation.

@jphruska
Copy link
Author

jphruska commented Aug 1, 2024

That makes sense, thanks. Good to know the signal appears to be strong, and perhaps indicative of a real inversion. Curiously, I ran a PCA of the SNPs located within it and didn't recover the expected signal -- individuals segregated by geography, not by zygosity. Will be interesting to dig into this further. Thanks again for your help.

@warthmann
Copy link

Hello, yes, I am having the same challenge that multiple reads supporting an event aren't easily distinguished. I usually then inspect and count them in IGV, but I was wondering whether you have considered stacking them in some way so that they are not displayed on top of each other. Plotting them on top of each other not only their number is impossible to tell, but split reads can also be hidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants