-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different ValidPairs rate between chromap and bowtie2 in HiC data #147
Comments
Is this human genome? Do you see very few alignment in the out.sam file as well? |
It's a plant genome. Here is the flagstat result of bam: chromap.out.bam: bowtie2.out.bam(after merge, rmdup) There seems OK with alignment, I'm not terribly bothered by the difference in the number of reads mapping. |
Then the discrepancy between Chromap and Bowtie2 probably comes from the map2frag stage. If this tool use TLEN field in the BAM file to determine the insert size, there could be an issue. We recently fixed a bug for the hi-c data: #139. If the TLEN is an issue in your data, could you please pull the version on github and give it a try? If this fixes your bug, we will release a version. |
I pull li_dev5 branch and got this error:
my cmd: |
You can use the main branch. It should contain the fix for the TLEN. I will check the issue you found for the li_dev5. |
I pull main branch and re-run my data(mapping using new chromap, sort, Map2frag), the Dumped_pairs_rate has NOTING change. But I check TLEN in bam(or sam) from chromap, seems like that's where the error is coming from. BTW, the error I got when I install li_dev5 branch comes from my OS version error, you can forget it. |
You can see that the alignments are indeed very far from each other, so the absolute value of TLEN is large. The sign is based on the forward or reverse strand of the mate pairs. I think in HiC the fragment size can be large, so the command "python mapped_2hic_fragments.py -v -a -s 0 -l 700 " that restricts the insert size to be between 0 and 700 might not be desirable? |
Thank you for your advice! I changed my command as Anyway, I'll try another pipeline form chromap to .hic file. I think there's no substitute for chromap's speed advantage! |
Thank you! I think for the hic data, there is no need to restrain the insert size, because many alignments are expected to be far from each other and on different chromosomes. I'm curious how Bowtie2 set TLEN values in your data. |
Problem
Different ValidPairs rate between chromap and bowtie2 in HiC data using HiC-Pro pipeline, which convert bam into *.VailidPairs format and stat.
When I use chromap, result is: Dumped_pairs_rate%: 85.6677191749
When I use bowtie2, result is: Dumped_pairs_rate%: 0.269802518711(same data)
CMD
chromap(0.2.5-r473)
chromap --preset hic -x genome.index -r genome.fa -1 R1 -2 R2 --SAM -o out.sam --remove-pcr-duplicates -t 8 --summary out.summary
samtools view -bh out.sam | samtools sort -@ 8 > out.bam
bowtie2(2.2.3)
bowtie2 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --rg-id --phred33-quals -p 5 --un out.unmap.fq --rg -x genome -U R1 | samtools view -F 4 -bS - > R1.bam
same cmd for R2
then merge bam and sort using samtools to get out.bam
map2frag(HiC-Pro tools)
python mapped_2hic_fragments.py -v -a -s 0 -l 700 -f DpnII_resfrag_genome.bed -r out.bam -o outdir
Here, I get different ValidPairs rate.
Data Info
HiC Data: PE 150, 200X depth clean
I've used chromap in many genome, it is the first time this has happened. Chromap is very helpful.
Thanks!
The text was updated successfully, but these errors were encountered: