Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiC-Pro does not perform deduplication even when RMDUP=1 is set in the config file unless the 'merge-persample' option is used, but this is not run by default and the option to do is not documented on the github page #307

Closed
gbonora opened this issue Jan 27, 2020 · 2 comments
Labels

Comments

@gbonora
Copy link

gbonora commented Jan 27, 2020

Hi,

I recently realized that HiC-Pro (v2.11.1) does not deduplicate read pairs even when RMDUP=1 is set in the config file unless the 'merge-persample' option is also used. However, the 'merge-persample' analysis step option is not run by default and the option to do is not documented in the 'How to use it ?' section on the github page, although it is described by the HiC-Pro's help (see attached slide).

I think it would be helpful to describe the 'merge-persample' analysis step option under the 'How to use it ?' section on you github page and to make it clear that this analysis step option is necessary for deduplication.

Thanks.

gb_20200123.pdf

@nservant
Copy link
Owner

Hi,
Indeed the duplicates removal is performed at the merge-persample step of the pipeline.
But you're right, there is mistake in the help page.
I will change that for the next version

@nservant nservant added the bug label Jan 30, 2020
@hwick
Copy link

hwick commented Feb 26, 2020

I thought I was having this same issue because the .Rstat file Valid_interaction_pairs number appears to include duplicate reads in the total, but if you check the .mergestat file it specifies valid_interaction and valid_interaction_rmdup totals which reflects the number before and after duplicates are removed. If you wc -l the .allValidPairs file it should match the rmdup file if duplicates are removed. I just ran with -s merge_persample to double check and the .allValidPairs file resulting from that has the same number of reads as the rmdup total and the original .allValidPairs file.

joreynajr added a commit to joreynajr/HiCnv that referenced this issue Apr 21, 2021
According to Servant there is no problem with downstream steps like
removing duplicates. The following links talk about this more:
- nservant/HiC-Pro#307
- nservant/HiC-Pro#142
To actually split the reads into chunks Servant has provided this
script: https://github.com/nservant/HiC-Pro/blob/master/doc/UTILS.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants