Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are coverage plots normalized? #992

Closed
danielcgingerich opened this issue Feb 23, 2022 · 5 comments
Closed

How are coverage plots normalized? #992

danielcgingerich opened this issue Feb 23, 2022 · 5 comments
Labels
documentation Documentation help

Comments

@danielcgingerich
Copy link

danielcgingerich commented Feb 23, 2022

I am having trouble finding how to normalize coverage tracks. First, what exactly are you normalizing - Tn5 insertions, or whole fragments? Or the 9bp binding site of Tn5? Are genomic regions divided into bins and fragments/Tn5 insertions counted in each bin?

Second, how are these counts normalized? Are they divided by the mean of a predefined window size in the surrounding region? Is it as simple as CPM?

I ask because I would like to create my own

Best

Dan

@danielcgingerich danielcgingerich added the documentation Documentation help label Feb 23, 2022
@timoast
Copy link
Collaborator

timoast commented Mar 2, 2022

First, what exactly are you normalizing

We are normalizing Tn5 insertion sites

Are genomic regions divided into bins and fragments/Tn5 insertions counted in each bin?

No, the data is base-resolution with smoothing applied before plotting (controlled by the window parameter)

Second, how are these counts normalized?

Coverage plots are normalized using a scaling factor for each group of cells (track), which is the number of cells in the group multiplied by the average sequencing depth for that group of cells. The total counts at each base position for the track are divided by this scaling factor, and then all tracks are multiplied by a common factor (the median scaling factor across all tracks) to bring the values back up to a reasonable range. This normalizes for differences in sequencing depth and cell number across groups.

I will update the documentation for CoveragePlot() to explain this better.

@danielcgingerich
Copy link
Author

@timoast that is very clear, thanks!

@danielcgingerich
Copy link
Author

Hey @timoast , thinking about this again, I have another question.

what type of smoothing is applied? Is it a smooth moving average or a kernel density estimation?

I saw a post on Dave Tang's blog about how he confused density plot with coverage plot, but I do not really think there would be much difference.

For instance, lets say I am comparing coverage of 2 regions.

  1. 10 base pairs long, 1 Tn5 cutsite per base pair
  2. 10 base pairs long, 1 Tn5 cutsite every other base pair

Whether or not kernel density or smooth mean is used, both would show approximately 2:1 ratio of coverage between the two regions.

@timoast
Copy link
Collaborator

timoast commented Jun 14, 2022

It's just a rolling window sum, not KDE

@danielcgingerich
Copy link
Author

cool cool. thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation help
Projects
None yet
Development

No branches or pull requests

2 participants