Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscalling of SNPs and GQ value of 0 #9086

Open
Sgornard opened this issue Feb 3, 2025 · 9 comments
Open

Miscalling of SNPs and GQ value of 0 #9086

Sgornard opened this issue Feb 3, 2025 · 9 comments

Comments

@Sgornard
Copy link

Sgornard commented Feb 3, 2025

Problem report

Affected tool(s) or class(es)

HaplotypeCaller

Affected version : 4.1

Description

We used gatk HaplotypeCaller to find SNPs in our samples, following this command line:
gatk HaplotypeCaller --java-options "-Xmx20g -XX:ParallelGCThreads=10" -R ref_genome.fa -I sample.bam -ERC GVCF -O output_sample.g.vcf.gz

Afterward, we also used GenomicsDBImport, GenotypeGVCFs and VariantFiltration.

However, we noticed quickly some genotypes had a GQ tag equal to 0 despite a very good DP. Looking at the PL, it seemed like gatk could not decide between a homozygous and a heterozygous site, but still called a homozygous site and displayed all reads mapping to the reference allele. Looking at the read mapping in IGV, we noticed that those sites were in fact heterozygous, and therefore miscalled.

Here is an example (we could provide more if needed) :
Site :

<style> </style>
                  ZCh209E ZCh210C ZCh212C ZChGA78441 ZChGA78478 ZChGA78479 ZChGA78481 ZChGA78483 ZChGA78529
Chr2 136076952 . C G 29209.6 PASS AC=13;AF=0.952;AN=18;BaseQRankSum=0;DP=881;ExcessHet=0.2205;FS=0;InbreedingCoeff=0.4627;MLEAC=167;MLEAF=1;MQ=59.06;MQRankSum=0;QD=30.27;ReadPosRankSum=1.91;SOR=0.714 GT:AD:DP:GQ:PGT:PID:PL:PS 1/1:6,30:36:46:.:.:1036,46,0:. 0/0:35,0:35:0:.:.:0,0,209:. 0/1:12,15:27:99:.:.:545,0,408:. 0/1:17,11:28:99:.:.:360,0,438:. 1/1:3,23:26:53:.:.:846,53,0:. 1/1:0,32:32:96:.:.:1325,96,0:. 1/1:5,21:26:14:.:.:710,14,0:. 0/1:9,16:25:99:.:.:593,0,301:. 1/1:0,23:23:69:.:.:949,69,0:.

Here, genotype of the individual ZCh210C is said to be homozygous (GT 0/0) and have all reads mapping the reference allele (AD 35:0). The GQ is 0, and PL is 0 for both homozygous and heterozygous genotypes.
The reads in IGV for this individual map as follows:

Image
Which clearly shows reads mapping on the alternative allele.

Why is gatk having this behavior?

Thank you for your anwsers,
Best,

Samuel

@gokalpcelik
Copy link
Contributor

Have you tried with a recent version such as 4.6.1.0? 4.1 is quite old and it has been a long time since then.

@Sgornard
Copy link
Author

Sgornard commented Feb 4, 2025

Hello,
Thanks for the suggestion, we will try that right away and come back with the results!

@Sgornard
Copy link
Author

Hello,

We tried with the new version 4.6.1.0 using haplotyepcaller and GenomicsDBImport and got the same results. It still failed to call some of the heterozygous genotype at high read depth (15x), like we have showed in the previous figure. This time, it assigned the site as ./..

Thank you for your help,

Samuel

@gokalpcelik
Copy link
Contributor

Can you post the GVCF entry of that particular site for that particular sample alone? It is possible that the site could have been pruned during reassembly or could have fallen below Standard Call Confidence threshold which is 30 for GenotypeGVCFs therefore emitted as NO_CALL.

@Sgornard
Copy link
Author

Yes. Please note that this is another site than the one I posted first, because we have plenty of miscallings and only ran the new version for some of our species.

GVCF entry :
Chr9 5463115 . G <NON_REF> . . END=5463115 GT:DP:GQ:MIN_DP:PL 0/0:15:0:15:0,0,157

Thanks!

@gokalpcelik
Copy link
Contributor

So GQ is 15 which falls way below the Standard Calling Confidence Threshold of 30.
Can you set --standard-min-confidence-threshold-for-calling 10 for GenotypeGVCFs and see if this site gets called?

@Sgornard
Copy link
Author

Isn't DP and MIN_DP 15? It seems that the GQ is actually 0, which is very weird when cheking on the reads. I provide below the IGV view of the reads mapping to this site and individual (ZMa1310) :

Image

@gokalpcelik
Copy link
Contributor

gokalpcelik commented Feb 13, 2025

Sorry yes my bad. GQ is 0. This is most likely a assembly failure in this region.
Why HaplotypeCaller does not call an expected variant?
Can you checkout suggestions from this article?
If none of them works for you then you may need to resort to the last settings which is --pileup-detection. This parameter enables HaplotypeCaller to inspect pileups along with reassembled kmers to find better results but just that parameter alone may not solve the whole issue.
You may need to add

--pileup-detection-enable-indel-pileup-calling true
--use-pdhmm true

parameters as well to get a better pileup detection from HaplotypeCaller. This method has a downside of not being compatible with GVCF output so it is up to you to decide how to proceed.

I hope this helps.

Regards.

@Sgornard
Copy link
Author

Thank you for digging through, we appreciate a lot! We will try those suggestions right away.

Thanks again for the help!

Best regards,

Samuel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants