-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are coverage plots normalized? #992
Comments
We are normalizing Tn5 insertion sites
No, the data is base-resolution with smoothing applied before plotting (controlled by the
Coverage plots are normalized using a scaling factor for each group of cells (track), which is the number of cells in the group multiplied by the average sequencing depth for that group of cells. The total counts at each base position for the track are divided by this scaling factor, and then all tracks are multiplied by a common factor (the median scaling factor across all tracks) to bring the values back up to a reasonable range. This normalizes for differences in sequencing depth and cell number across groups. I will update the documentation for |
@timoast that is very clear, thanks! |
Hey @timoast , thinking about this again, I have another question. what type of smoothing is applied? Is it a smooth moving average or a kernel density estimation? I saw a post on Dave Tang's blog about how he confused density plot with coverage plot, but I do not really think there would be much difference. For instance, lets say I am comparing coverage of 2 regions.
Whether or not kernel density or smooth mean is used, both would show approximately 2:1 ratio of coverage between the two regions. |
It's just a rolling window sum, not KDE |
cool cool. thanks for the quick response! |
I am having trouble finding how to normalize coverage tracks. First, what exactly are you normalizing - Tn5 insertions, or whole fragments? Or the 9bp binding site of Tn5? Are genomic regions divided into bins and fragments/Tn5 insertions counted in each bin?
Second, how are these counts normalized? Are they divided by the mean of a predefined window size in the surrounding region? Is it as simple as CPM?
I ask because I would like to create my own
Best
Dan
The text was updated successfully, but these errors were encountered: