-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with FindMarkers() methodology #795
Comments
Hi @danielcgingerich, thanks for raising this issue. Having now thought about this more, I think you're right. In particular, the situation you've highlighted where there can be a conflict between FC and p-val is a good example of why we need to use consistent values for both. I think the change from normalized to raw counts was motivated by wanting to avoid any assumptions about how the values in the "data" slot were processed. In Seurat, there was an assumption when computing fold-changes that the values had been log-normalized, which caused issues when that assumption was not correct (if running on scATAC-seq data for example). However, the main issue with that was the choice of mean function for computing fold-change, rather than the data slot used. Since we've added a |
@timoast Should normalized counts also be used for logFC calculation with scRNA data in Seurat? |
By default they are |
@timoast Another question: For Seurat, assuming Edit: Edit2: |
I think this is a question for the Seurat repo, please raise there |
The two parts of the find markers function: 1) fold change calculation and 2) p value calculation. Fold change is calculated using raw counts, while the p value is calculated using normalized data. Why is this? I do not believe raw counts are comparable to normalized data in this scenario, and here is why:
Raw counts vs normalized counts result in a change of directionality of many peaks. Example: peak A might have a positive logFC value with raw counts, but it becomes negative when using normalized data. This means that the input data for the p value and the input data for the fold change are contradicting each other.
I calculated fold change on the raw counts and normalized counts and looked at how this changes direction of peak fold changes.
In this case, out of 102808 peaks, 36399 of them change direction depending on whether raw vs normalized data is used as input. For these peaks, the input for the p value and the input for the fold change calculation are directly contradicting each other.
The text was updated successfully, but these errors were encountered: