Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating dN/dS Ratios for Intraspecific Mitochondrial Genes #1804

Open
Tuc-Nguyen opened this issue Jan 31, 2025 · 6 comments
Open

Estimating dN/dS Ratios for Intraspecific Mitochondrial Genes #1804

Tuc-Nguyen opened this issue Jan 31, 2025 · 6 comments

Comments

@Tuc-Nguyen
Copy link

Tuc-Nguyen commented Jan 31, 2025

Dear Dr. Pond,

I am currently using HyPhy to investigate selection on mitochondrial genes within a species, primarily to verify pervasive negative selection in mitochondrial genomes. My dataset includes 300+ aligned sequences with the substitution rate per codon lower than the ideal range given the intraspecific nature of these sequences. While I had no problem with successfully running these analyses, I do have a couple of questions regarding the interpretation of the data:

  1. Most global dN/dS estimated across my BUSTED, FEL, and MEME analyses are consistently under 0.1, suggesting widespread purifying selection, which is consistent with expectations for mtDNA. However, given that dN/dS ratios were initially developed for divergent populations, I’m concerned about the robustness of these metrics for intraspecific data. Could you provide guidance on the reliability of these estimates in the context of nonrecombinant, intraspecific sequences?

  2. Despite the lower substitution rates, I’ve identified potential diversifying selection on some codons through both FEL and MEME analyses. How should I interpret these findings?

Any suggestions for alternative methods or approaches within the HyPhy framework that could better suit the characteristics of my data would be immensely valuable.

Thank you for your time and insights!

@spond
Copy link
Member

spond commented Feb 2, 2025

Dear @Tuc-Nguyen,

  1. Our "standard" approach for intra-species analyses is to use --branches Internal for all the analyses you've mentioned (see https://pubmed.ncbi.nlm.nih.gov/26814962/)
Image
  1. Can you provide an example? Will help to interpret the specific case. More generally, it is quite often that you can have evidence of selection on individual sites with MEME/FEL, even though gene-wide analyses don't find anything. There are two main reasons for that

    • FEL/MEME are more flexible/sensite, because they have site-level ω estimates. For example, it's difficult for a gene-level model to discover a few sites with ω > 1 in a long alignment where most sites are strongly conserved.
    • FEL/MEME are also more prone to false positives, because (especially for longer alignments), they do a lot more tests. For example, if all sites are neutral (a very unnatural hypothetical, but still), you would expect FEL/MEME to find 10% of "selected" sites when you set p = 0.1

Best,
Sergei

@Tuc-Nguyen
Copy link
Author

Hi @spond,

Thank you for your prompt response. I ran the analyses again with --branches Internal for my analyses and it eliminated all of the sites that were found to be under pervasive (FEL) or episodic (MEME) diversifying selection, except for one.
Here is the result of that gene in:

### 1. FEL
`### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases

  • Log(L) = -2858.99, AIC-c = 6633.90 (456 estimated parameters)
  • non-synonymous/synonymous rate ratio for background = 0.1502
  • non-synonymous/synonymous rate ratio for test = 0.0314

Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model

  • Log(L) = -2851.33
  • non-synonymous/synonymous rate ratio for background = 0.1805
  • non-synonymous/synonymous rate ratio for test = 0.0247

For partition 1 these sites are significant at p <=0.1

Codon Partition alpha beta LRT Selection detected?
19 1 3.008 0.000 6.830 Neg. p = 0.0090
22 1 12.607 0.000 10.831 Neg. p = 0.0010
28 1 3.864 0.000 4.541 Neg. p = 0.0331
31 1 5.066 0.000 8.101 Neg. p = 0.0044
34 1 3.085 0.000 2.857 Neg. p = 0.0910
38 1 5.555 0.000 9.859 Neg. p = 0.0017
39 1 1.533 0.000 6.128 Neg. p = 0.0133
41 1 2.183 0.000 3.883 Neg. p = 0.0488
49 1 3.143 0.000 2.882 Neg. p = 0.0896
50 1 1.975 0.000 3.624 Neg. p = 0.0570
53 1 12.607 0.000 11.101 Neg. p = 0.0009
56 1 3.122 0.000 3.248 Neg. p = 0.0715
60 1 3.046 0.000 3.201 Neg. p = 0.0736
65 1 0.658 0.000 3.070 Neg. p = 0.0798
66 1 3.270 0.000 4.192 Neg. p = 0.0406
69 1 1.959 0.000 3.428 Neg. p = 0.0641
77 1 4.726 0.000 11.131 Neg. p = 0.0008
78 1 3.139 0.000 5.907 Neg. p = 0.0151
92 1 1.917 0.000 8.993 Neg. p = 0.0027
93 1 15.336 0.000 10.066 Neg. p = 0.0015
98 1 3.544 0.000 3.527 Neg. p = 0.0604
103 1 1.570 0.000 2.953 Neg. p = 0.0857
116 1 3.809 0.000 3.174 Neg. p = 0.0748
118 1 2.010 0.000 3.581 Neg. p = 0.0584
121 1 2.988 0.000 6.368 Neg. p = 0.0116
129 1 9.520 0.000 9.697 Neg. p = 0.0018
131 1 2.674 0.000 3.161 Neg. p = 0.0754
132 1 3.820 0.000 3.624 Neg. p = 0.0570
135 1 2.672 0.000 6.285 Neg. p = 0.0122
157 1 3.446 0.000 3.473 Neg. p = 0.0624
158 1 7.862 0.000 8.927 Neg. p = 0.0028
171 1 24.557 0.000 31.690 Neg. p = 0.0000
181 1 12.696 0.000 10.754 Neg. p = 0.0010
185 1 2.046 0.000 3.682 Neg. p = 0.0550
188 1 1.134 0.000 2.788 Neg. p = 0.0950
190 1 5.696 0.000 15.935 Neg. p = 0.0001
192 1 8.026 0.000 6.770 Neg. p = 0.0093
202 1 4.084 0.000 5.229 Neg. p = 0.0222
203 1 3.456 0.000 5.514 Neg. p = 0.0189
210 1 1.238 0.000 4.400 Neg. p = 0.0359
221 1 8.622 0.000 9.592 Neg. p = 0.0020
231 1 1.959 0.000 3.355 Neg. p = 0.0670
234 1 4.188 0.000 7.362 Neg. p = 0.0067
244 1 3.572 0.000 8.450 Neg. p = 0.0036
252 1 3.884 0.000 3.649 Neg. p = 0.0561
256 1 4.897 0.000 6.759 Neg. p = 0.0093
258 1 7.936 0.000 9.606 Neg. p = 0.0019
259 1 1.972 0.000 4.113 Neg. p = 0.0425
264 1 3.678 0.000 3.651 Neg. p = 0.0560
273 1 3.294 0.000 3.316 Neg. p = 0.0686
276 1 2.946 0.000 7.232 Neg. p = 0.0072
283 1 3.793 0.000 3.423 Neg. p = 0.0643

** Found 0 sites under pervasive positive diversifying and 52 sites under negative selection at p <= 0.1**`

### 2. MEME
`### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases

  • Log(L) = -2858.99, AIC-c = 6633.90 (456 estimated parameters)
  • non-synonymous/synonymous rate ratio for background = 0.1502
  • non-synonymous/synonymous rate ratio for test = 0.0314

Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model

  • Log(L) = -2851.33
  • non-synonymous/synonymous rate ratio for background = 0.1805
  • non-synonymous/synonymous rate ratio for test = 0.0247

For partition 1 these sites are significant at p <=0.1

Codon Partition alpha beta+ p+ LRT Episodic selection detected? # branches Most common codon substitutions at this site
94 1 0.000 394.804 0.070 8.438 Yes, p = 0.0065 4 [12]TTT>ATT

** Found 1 sites under episodic diversifying positive selection at p <= 0.1**`

How would you interpret this?

@spond
Copy link
Member

spond commented Feb 3, 2025

Dear @spond,

Which version of HyPhy are you using (hyphy --version)? The MEME output looks to be from a rather outdated version.

Site 94 looks interesting, because acording to MEME it has 12(!) non-synonymous substitutions; this would definitely be remarkable.

Best,
Sergei

@Tuc-Nguyen
Copy link
Author

Tuc-Nguyen commented Feb 3, 2025

hi @spond it is currently hyphy/intel/2.5.24 on our cluster from my understanding.

I reckon it would not affect the findings, would it?

@spond
Copy link
Member

spond commented Feb 3, 2025

Dear @Tuc-Nguyen,

This version is >4 years out of date. I would strongly recommend updating to the newest version, because it will provide significantly improved performance, computationally AND statistically for all of the methods you've been using.

Best,
Sergei

Image

@Tuc-Nguyen
Copy link
Author

Tuc-Nguyen commented Feb 3, 2025

@spond Thank you! I have requested an update from the IT.

Meanwhile I would like to pick your brain on one case (using the results from the old version) -- just in case I encounter it again with the updates. Here is the only one gene where there is some significance in BUSTED:

`### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases

  • Log(L) = -1724.58, AIC-c = 4307.44 (427 estimated parameters)
  • non-synonymous/synonymous rate ratio for background = 0.0564
  • non-synonymous/synonymous rate ratio for test = 0.0156

Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model

  • Log(L) = -1722.08, AIC-c = 4302.43 (427 estimated parameters)
  • non-synonymous/synonymous rate ratio for background = 0.0597
  • non-synonymous/synonymous rate ratio for test = 0.0135

Performing the full (dN/dS > 1 allowed) branch-site model fit

  • Log(L) = -1686.44, AIC-c = 4257.42 (440 estimated parameters)
  • For test branches, the following rate distribution for branch-site combinations was inferred
Selection mode dN/dS Proportion, % Notes
Negative selection 0.000 40.546
Negative selection 0.000 59.397 Collapsed rate class
Diversifying selection 524.262 0.057
  • For background branches, the following rate distribution for branch-site combinations was inferred
Selection mode dN/dS Proportion, % Notes
Negative selection 0.000 0.263
Negative selection 0.000 99.504 Collapsed rate class
Diversifying selection 277.051 0.234
  • The following rate distribution for site-to-site synonymous rate variation was inferred
Rate Proportion, % Notes
0.000 16.288
0.671 77.065
7.264 6.647

Performing the constrained (dN/dS > 1 not allowed) model fit

  • Log(L) = -1689.32, AIC-c = 4261.16 (439 estimated parameters)
  • For test branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred
Selection mode dN/dS Proportion, % Notes
Negative selection 0.000 46.898
Negative selection 0.000 51.902 Collapsed rate class
Neutral evolution 1.000 1.201
  • The following rate distribution for site-to-site synonymous rate variation was inferred
Rate Proportion, % Notes
0.000 15.787
0.657 77.094
6.930 7.119

Branch-site unrestricted statistical test of episodic diversification [BUSTED]

Likelihood ratio test for episodic diversifying positive selection, p = 0.0281.`

I have two questions:

  1. If I were to publish the dN/dS ratio, which results should I use? The global omega estimate based on relative GTR branch lengths and nucleotide substitution biases, or the improved branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model? And I used --branches Internal for this analysis. I assume that would be the non-synonymous/synonymous rate ratio for test?
  2. Even though it's likely that there is some episodic diversifying positive selection on the whole gene, FEL and MEME yielded no codon that is under positive selection. In such cases, what would the appropriate conclusion be? Or what follow up analyses should I do?

Again, thank you so much for your time and assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants