How are coverage plots normalized? #992

danielcgingerich · 2022-02-23T16:35:03Z

I am having trouble finding how to normalize coverage tracks. First, what exactly are you normalizing - Tn5 insertions, or whole fragments? Or the 9bp binding site of Tn5? Are genomic regions divided into bins and fragments/Tn5 insertions counted in each bin?

Second, how are these counts normalized? Are they divided by the mean of a predefined window size in the surrounding region? Is it as simple as CPM?

I ask because I would like to create my own

Best

Dan

timoast · 2022-03-02T16:00:34Z

First, what exactly are you normalizing

We are normalizing Tn5 insertion sites

Are genomic regions divided into bins and fragments/Tn5 insertions counted in each bin?

No, the data is base-resolution with smoothing applied before plotting (controlled by the window parameter)

Second, how are these counts normalized?

Coverage plots are normalized using a scaling factor for each group of cells (track), which is the number of cells in the group multiplied by the average sequencing depth for that group of cells. The total counts at each base position for the track are divided by this scaling factor, and then all tracks are multiplied by a common factor (the median scaling factor across all tracks) to bring the values back up to a reasonable range. This normalizes for differences in sequencing depth and cell number across groups.

I will update the documentation for CoveragePlot() to explain this better.

danielcgingerich · 2022-03-03T20:52:02Z

@timoast that is very clear, thanks!

danielcgingerich · 2022-06-14T15:05:02Z

Hey @timoast , thinking about this again, I have another question.

what type of smoothing is applied? Is it a smooth moving average or a kernel density estimation?

I saw a post on Dave Tang's blog about how he confused density plot with coverage plot, but I do not really think there would be much difference.

For instance, lets say I am comparing coverage of 2 regions.

10 base pairs long, 1 Tn5 cutsite per base pair
10 base pairs long, 1 Tn5 cutsite every other base pair

Whether or not kernel density or smooth mean is used, both would show approximately 2:1 ratio of coverage between the two regions.

timoast · 2022-06-14T15:09:53Z

It's just a rolling window sum, not KDE

danielcgingerich · 2022-06-14T19:49:11Z

cool cool. thanks for the quick response!

danielcgingerich added the documentation Documentation help label Feb 23, 2022

danielcgingerich closed this as completed Mar 3, 2022

danielcgingerich reopened this Jun 14, 2022

danielcgingerich closed this as completed Jun 14, 2022

DLGisch mentioned this issue Oct 16, 2022

CoveragePlot #1248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are coverage plots normalized? #992

How are coverage plots normalized? #992

danielcgingerich commented Feb 23, 2022 •

edited

Loading

timoast commented Mar 2, 2022

danielcgingerich commented Mar 3, 2022

danielcgingerich commented Jun 14, 2022

timoast commented Jun 14, 2022

danielcgingerich commented Jun 14, 2022

How are coverage plots normalized? #992

How are coverage plots normalized? #992

Comments

danielcgingerich commented Feb 23, 2022 • edited Loading

timoast commented Mar 2, 2022

danielcgingerich commented Mar 3, 2022

danielcgingerich commented Jun 14, 2022

timoast commented Jun 14, 2022

danielcgingerich commented Jun 14, 2022

danielcgingerich commented Feb 23, 2022 •

edited

Loading