Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERRROR in heat_tree_matrix - missing value where TRUE/FALSE needed #240

Closed
Gian77 opened this issue Jul 7, 2018 · 16 comments
Closed

ERRROR in heat_tree_matrix - missing value where TRUE/FALSE needed #240

Gian77 opened this issue Jul 7, 2018 · 16 comments

Comments

@Gian77
Copy link

Gian77 commented Jul 7, 2018

Hello,
I am new with this package. I have tibble with a variable that contain 3 levels (treatments). I am trying to plot differential heat trees that shows differential abundances between the different treatments. However, I encountered the following error:

> obj$data$diff_table # A tibble: 5,094 x 7
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
1 aab Picea Abies 0.0915 2165 2439. 0.0487 2 aac Picea Abies 0.268 3500. 3794. 0.0206
3 aad Picea Abies -0.112 -329 -594. 0.339 4 aae Picea Abies -0.00217 -9.5 -87.4 0.867
5 aaf Picea Abies -0.367 -424. -365. 0.358 6 aag Picea Abies -0.774 -346. -337. 0.173
7 aah Picea Abies 0 0 21.4 0.0853 8 aai Picea Abies 0 0 6.28 0.0862
9 aaj Picea Abies -0.205 -9 -5.69 0.697 10 aak Picea Abies -0.157 -54.5 -26.3 0.493
# ... with 5,084 more rows >
> heat_tree_matrix(obj, + dataset = "diff_table",
+ node_size = n_obs, + node_label = taxon_names,
+ node_color = log2_median_ratio, + node_color_range = diverging_palette(),
+ node_color_trans = "linear", + node_color_interval = c(-3, 3),
+ edge_color_interval = c(-3, 3), + node_size_axis_label = "Number of OTUs",
+ node_color_axis_label = "Log2 ratio median proportions") Error in if (zero_range(as.numeric(limits))) { :
missing value where TRUE/FALSE neededIn addition: There were 50 or more warnings (use warnings() to see the first 50)
`>

Thank you for helping on this,

Gian

@zachary-foster
Copy link
Contributor

Hello @Gian77! Can you please send me the data to reproduce this error? You can send part of the raw data and the code to reproduce the error or just use the save function to save a copy of obj and send that. Thanks!

@Gian77
Copy link
Author

Gian77 commented Jul 11, 2018

Hello @zachary-foster,

I had my data subsetted and I still have the same error. Please find the save(obj, file = "obj_filt.RData") of the object attached below.

Thanks a lot for troubleshooting this,

Gian

obj_filt.RData.zip

@zachary-foster
Copy link
Contributor

Hi @Gian77, no problem, I think I fixed it. It was caused by the treatment columns being factors instead of characters. It should work with both factors and characters now. Try reinstalling the dev version and let me know if it still happens. Thanks!

devtools::install_github("ropensci/taxa")
devtools::install_github("grunwaldlab/metacoder")

@akiledal
Copy link

Hi @zachary-foster, using the latest dev version I'm also seeing this or a similar issue when attempting to make a heat tree matrix from the attached obj .

obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund",
cols = met_samples$sample_id,
groups = met_samples$year_mit, combinations = list(c("2013_mitigated","2014_mitigated"),c("2013_unmitigated","2014_unmitigated")))

heat_tree_matrix(obj,
dataset = "diff_table",
node_size = n_obs,
node_label = taxon_names,
node_color = log2_median_ratio,
node_color_range = diverging_palette(),
node_color_trans = "linear",
node_color_interval = c(-3, 3),
edge_color_interval = c(-3, 3),
node_size_axis_label = "Number of OTUs",
node_color_axis_label = "Log2 ratio median proportions",
node_label_size_range = c(.015,.015),
node_label_max = 200,
initial_layout = "reingold-tilford", layout = "davidson-harel",
overlap_avoidance = 1.2)

obj.zip

@Gian77
Copy link
Author

Gian77 commented Aug 1, 2018

Hello @zachary-foster,

I downloaded the dev version you have corrected and it worked. Thanks a lot for fixing it and for the great package.

The only thing are a series of warnings that I think are not compromising the results of the analysis, what do you think?

> heat_tree_matrix(obj,
+ dataset = "diff_table",
+ node_size = n_obs,
+ node_label = taxon_names,
+ node_color = log2_median_ratio,
+ node_color_range = diverging_palette(),
+ node_color_trans = "linear",
+ node_color_interval = c(-3, 3),
+ edge_color_interval = c(-3, 3),
+ node_size_axis_label = "Number of OTUs",
+ node_color_axis_label = "Log2 ratio median proportions")
Warning messages:
1: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'min' arg is deprecated. Use 'lower' instead.
2: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'max' arg is deprecated. Use 'upper' instead.
3: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'min' arg is deprecated. Use 'lower' instead.
4: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'max' arg is deprecated. Use 'upper' instead.
5: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'min' arg is deprecated. Use 'lower' instead.
6: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'max' arg is deprecated. Use 'upper' instead.
7: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'min' arg is deprecated. Use 'lower' instead.
8: In GA::ga(type = "real-valued", fitness = function(x) optimality_stat(x[1], :
'max' arg is deprecated. Use 'upper' instead.

Gian

@zachary-foster
Copy link
Contributor

Hi Gian,

Thanks! Im glad to hear its working. You can ignore those warnings. They will not cause any problems and they should be fixed in the most recent development version.

@Mirin1357
Copy link

Mirin1357 commented Jul 7, 2019

Hi @zachary-foster,
first all compliments for this beautiful package for what I agree is the most appropriate presentation of a large scale microbiome data.

My issue is the following: I wanted to use DESeq2 instead of "compare_groups" for my 16S data, since I already use microbiomeSeq for the rest. I have 5 SampleType groups I want to compare in between (note: normal compare_means and heat_tree_matrix work).
I tried to do with your suggested "upgrade" in https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/metacoder-discussions/xqzsD592QGg/6b7Bw13QAAAJ, but unfortunately the calc_diff_abund_deseq2 did not seem to work. Therefore, I performed DESeq2 on overall file, extracted individual comparison results, created a joint df, added to obj$data and tried to do heat_tree_matrix which failed.

taxmap object:

> obj$data
$tax_data
# A tibble: 1,321 x 82
   taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041`
   <chr>               <dbl>            <dbl>            <dbl>
 1 ac                      0                0                0
 2 ac                      0                0                0
 3 ac                      0                0                0
 4 at                      0               10                0
 5 em                      0                0                0
 6 em                     55                4                0
 7 cc                      0               13                0
 8 at                      4               27                0
 9 at                      1                2                0
10 at                      0                0                0
# ... with 1,311 more rows, and 78 more variables:
$class_data
# A tibble: 7,926 x 5
   taxon_id input_index tax_rank tax_name regex_match
   <chr>          <int> <chr>    <chr>    <chr>      
 1 ab                 1 k        Bacteria k__Bacteria
 2 ac                 1 p        SR1      p__SR1     
 3 ac                 1 c        NA       c__NA      
 4 ac                 1 o        NA       o__NA      
 5 ac                 1 f        NA       f__NA      
 6 ac                 1 g        NA       g__NA      
 7 ab                 2 k        Bacteria k__Bacteria
 8 ac                 2 p        SR1      p__SR1     
 9 ac                 2 c        NA       c__NA      
10 ac                 2 o        NA       o__NA      
# ... with 7,916 more rows

$tax_abund
# A tibble: 354 x 80
   taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041`
   <chr>               <dbl>            <dbl>            <dbl>
 1 ab                  38865            29891            55151
 2 ac                      0                0                0
 3 ad                    128               89              202
 4 ae                      0                0                0
 5 af                   5779             2458             2309
 6 ag                  11668             4580             8488
 7 ah                      1                8                0
 8 ai                      0                0                0
 9 aj                      0                0                0
10 ak                   4911             2770             7678
# ... with 344 more rows, and 76 more variables:
$testM
# A tibble: 3,540 x 9
   taxon_id baseMean log2FoldChange lfcSE   stat   pvalue
   <chr>       <dbl>          <dbl> <dbl>  <dbl>    <dbl>
 1 ab       14845.           -0.398 0.312 -1.28  2.02e- 1
 2 ac           1.49         -4.99  2.57  -1.94  5.22e- 2
 3 ad          19.1          -3.87  1.16  -3.33  8.74e- 4
 4 ae          63.7          10.8   0.717 15.0   4.13e-51
 5 af        3740.            2.33  0.392  5.95  2.73e- 9
 6 ag        1833.           -0.926 0.517 -1.79  7.34e- 2
 7 ah           1.99         -5.41  1.56  -3.46  5.43e- 4
 8 ai           8.83          7.31  1.17   6.25  4.09e-10
 9 aj          52.9          10.6   0.705 15.0   7.71e-51
10 ak         886.           -0.381 0.518 -0.735 4.62e- 1
# ... with 3,530 more rows, and 3 more variables: padj <dbl>,
#   treatment_1 <chr>, treatment_2 <chr>

testM represents concatenated output of DESeq2 as following:

### take 
obj            # taxmap object with summed taxons from nonrarefied counts
sd <- obj$data$sample_data     # sample data to make it easier

# Make sure metadata is in same order as columns
all(sd$SampleID == colnames(obj$data$tax_abund)[-1])

# Convert data to DESeq2 format and do differential abundance analysis
deseq_tax_data <- DESeqDataSetFromMatrix(countData = obj$data$tax_abund [-1],
                                         colData = sd,
                                         design = ~ SampleType) %>%
  DESeq(test = "Wald",
        fitType = "parametric",
        sfType = "poscounts")

#------------------------------------------------------------------------#
# Get data for comparison of TWO treatments
# 1st combination
tax_SvB <- cbind(taxon_id = obj$data$tax_abund$taxon_id, # taxon id needed for plotting
                                 results(deseq_tax_data, contrast = c('SampleType',"BAL",'SALIVA')))

tax_SvB <- as_tibble(tax_SvB)
tax_SvB$treatment_1 <- "BAL"
tax_SvB$treatment_2 <- "SALIVA"

# -> done for all combinations

# create new list for obj$data
testM <- do.call ("rbind",list(tax_SvDP,tax_SvB, tax_SvPT, tax_SvT, tax_BALvDP, tax_BALvPT, tax_BALvT, tax_DPvPT, tax_DPvT,tax_PTvT))
testM$treatment_1 <- as.factor (testM$treatment_1)  # tried also as letting them as.character, did not work
testM$treatment_2 <- as.factor (testM$treatment_2)

#assign to obj$data
obj$data$testM <- testM

## keep only significant and clean from NA, if present
obj$data$testM$log2FoldChange[obj$data$testM$padj > 0.05 | obj$data$testM$padj == "NA" ] <- 0

#plot tree
heat_tree_matrix(obj,
                 data = "testM", 
                 node_size = n_obs, 
                 node_label = taxon_names,
                 node_color = log2FoldChange, 
                 node_color_range = diverging_palette(), 
                 node_color_trans = "linear", 
                 node_color_interval = c(-12, 11), 
                 edge_color_interval = c(-12, 11), 
                 node_size_axis_label = "Number of OTUs",
                 node_color_axis_label = "Log2 ratio median proportions",
                 seed = 1,
                 layout = "davidson-harel", 
                 initial_layout = "reingold-tilford") 

Error in if (zero_range(as.numeric(limits))) { : 
  missing value where TRUE/FALSE needed
In addition: There were 14 warnings (use warnings() to see them)
Messages d'avis :
1: NAs found in the "node_color" option. These may cause plotting errors or other strange behavior. NAs can be created if there is not a value for each taxon. The following 354 of 354 taxa have NAs for the "node_color" option:
  ab, ac, ad, ae, af, ag ... ry, rz, sa, sb, sc

2: NAs found in the "edge_color" option. These may cause plotting errors or other strange behavior. NAs can be created if there is not a value for each taxon. The following 354 of 354 taxa have NAs for the "edge_color" option:
  ab, ac, ad, ae, af, ag ... ry, rz, sa, sb, sc

3: In min(x_points) : no non-missing arguments to min; returning Inf
4: In max(x_points) : no non-missing arguments to max; returning -Inf
5: In min(y_points) : no non-missing arguments to min; returning Inf
6: In max(y_points) : no non-missing arguments to max; returning -Inf
7: In min(x_points) : no non-missing arguments to min; returning Inf
8: In max(x_points) : no non-missing arguments to max; returning -Inf
9: In min(y_points) : no non-missing arguments to min; returning Inf
10: In max(y_points) : no non-missing arguments to max; returning -Inf
11: In min(x_points) : no non-missing arguments to min; returning Inf
12: In max(x_points) : no non-missing arguments to max; returning -Inf
13: In min(y_points) : no non-missing arguments to min; returning Inf
14: In max(y_points) : no non-missing arguments to max; returning -Inf

Of the record, the same obj plots a heat_tree (of course not grouped).

I continued to read other issues and I saw #195, changed the name "testM" to "diff_table", still no change.
I apologize in advance if my error is a clumsy one, but I just do not see it and I do not want to give up on these great visualisation.

Thank you very much!

@Mirin1357
Copy link

Mirin1357 commented Jul 7, 2019

Being on the matter, I have an additional question considering the filtering of the obj before passing it to heat_tree, that does not work if either node_size or node_color is not set to n_obs.

My "obj" has 5 samples within.
I want to see N of samples per taxon (tax_occ) and mean abundance % per group:

obj$data$mean_perc <-calc_group_mean(obj, 
                                     data="tax_abund", 
                                     cols=sd$SampleID,
                                     groups = sd$SampleType)
obj$data$tax_occ <- calc_n_samples(obj, "tax_abund", col = sd$SampleID, groups = sd$SampleType)

I want to visualise taxons up until order, only for BAL with corresponding number of non-zero samples and average abundance

heat_tree(obj,
         node_size = data$tax_occ$BAL,   
          node_color = data$mean_perc$BAL, 
          
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",
          
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford")

this gives a normal tree, but if I want to add filter until order, I have the following error message

obj %>%
taxa::filter_taxa (taxon_ranks == "o", supertaxa = TRUE) %>%
 heat_tree (node_size = data$tax_occ$BAL,   
          node_color = data$mean_perc$BAL,           
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",          
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford")

Error in check_element_length(c("node_size", "edge_size", "node_label_size",  : 
  Length of argument'node_size' must be a factor of the length of 'taxon_id'
In addition: Warning message:
There is no "taxon_id" column in the data set "4", so there are no taxon IDs.

This disappears if I change node_size to n_obs.
Is there a way to pass this by and apply the same in heat_tree_matrix instead of n_obs (where I would also like to put tax_occ).

Thank you very much in advance!

@zachary-foster
Copy link
Contributor

Hello @Mirin1357, thanks for the kind words.

DESeq issue

You seem to be doing it the right way. Its hard for me to say what exactly the problem is without being able to reproduce the error. Perhaps you don't have every combination of treatments or log2FoldChange is NA for all the taxa of one combination?

I noticed you used obj$data$testM$padj == "NA". You probably want is.na(obj$data$testM$padj):

x = c('a', NA, 'NA')
x == 'NA'
#> [1] FALSE    NA  TRUE
is.na(x)
#> [1] FALSE  TRUE FALSE

Created on 2019-07-09 by the reprex package (v0.3.0)

If you can email me the result of save(obj, file = 'obj.RData') right before the call the heat_tree_matrix, I can probably figure out what the problem is. My email is zacharyfoster1989 at gmail.com

Filter obs issue

You probably want obj$data$tax_occ$BAL instead of data$tax_occ$BAL. Does that fix it?

Is there a way to pass this by and apply the same in heat_tree_matrix instead of n_obs (where I would also like to put tax_occ).

You should be able to use the same code for heat_tree_matrix. Any per-taxon value should work for either.

@zachary-foster zachary-foster reopened this Jul 9, 2019
@Mirin1357
Copy link

Hi @zachary-foster,

thank you very much for your reactivity!

DESeq

Considering NA, I have accidentally omitted the lines where I check for NA before merging with obj, so here they are and their output.

for being more logic, I replaced testM with deseq.tbl

#concatenate DESeq pair-wise comparisons
deseq.tbl <- do.call ("rbind",list(tax_SvB,tax_SvDP, tax_SvPT, tax_SvT, tax_BALvDP, tax_BALvPT, tax_BALvT, tax_DPvPT, tax_DPvT,tax_PTvT)) 

## check for unknown or empty
apply(deseq.tbl, 2, function(x) any(is.na(x) | x == "na" | x == "NA"))
      taxon_id       baseMean log2FoldChange          lfcSE           stat         pvalue           padj    treatment_1    treatment_2 
          TRUE          FALSE          FALSE          FALSE          FALSE          FALSE           TRUE          FALSE          FALSE 

#a) remove taxa == "na" and check
deseq.tbl =  subset (deseq.tbl, taxon_id != "na")
apply(deseq.tbl, 2, function(x) any(is.na(x) | x == "na" | x == "NA"))
      taxon_id       baseMean log2FoldChange          lfcSE           stat         pvalue           padj    treatment_1    treatment_2 
         FALSE          FALSE          FALSE          FALSE          FALSE          FALSE           TRUE          FALSE          FALSE 

# b) remove padj = NA and check
deseq.tbl$padj [is.na(deseq.tbl$padj)] <- 1
apply(deseq.tbl, 2, function(x) any(is.na(x) | x == "na" | x == "NA"))
      taxon_id       baseMean log2FoldChange          lfcSE           stat         pvalue           padj    treatment_1    treatment_2 
         FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE          FALSE 

I tried only a), only b) or both before adding to obj but did not work for the plot.
Considering the obj, it actually also has "na" in taxon_id

apply(obj$data$tax_abund, 2, function(x) any(is.na(x) | x == "na" | x == "NA"))
taxon_id DNA17025-036 DNA17025-040 DNA17025-041 DNA17025-042 
        TRUE        FALSE        FALSE        FALSE        FALSE       ...

I tried also to do no modification on the deseq.tbl, add it to the obj as it is and pass the log2fold "filter" before plotting, did not work either.

obj$data$diff_table$log2FoldChange[obj$data$diff_table$padj > 0.05 | obj$data$diff_table$padj == "NA" | is.na(obj$data$diff_table$padj)] <- 0

I will send you the requested right away, and thank you for your help!

Filter obs

Thank you for the suggestion, I have tried but I still have the same error.

a) no filter -> works

heat_tree(obj,
          node_size = obj$data$tax_occ$BAL,   
          node_color = obj$data$mean_perc$BAL,           
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",        
          layout = "davidson-harel", 
          initial_layout = "reingold-tilford")

b) filter -> does not work

obj %>%
taxa::filter_taxa(taxon_ranks == "o", supertaxa = TRUE)%>%
heat_tree(node_size = obj$data$tax_occ$BAL,   
               node_color = obj$data$mean_perc$BAL,           
               node_label = taxon_names,
               node_size_axis_label = "Samples with reads",
               node_color_axis_label = "Average percentage",        
               layout = "davidson-harel", 
               initial_layout = "reingold-tilford")
Error in check_element_length(c("node_size", "edge_size", "node_label_size",  : 
  Length of argument'node_size' must be a factor of the length of 'taxon_id'
In addition: Warning message:
There is no "taxon_id" column in the data set "4", so there are no taxon IDs

c) filter + n_obs for node_size & node_color -> works

obj %>%
taxa::filter_taxa(taxon_ranks == "o", supertaxa = TRUE)%>%
heat_tree(node_size = n_obs,   
               node_color = n_obs,           
               node_label = taxon_names,
               node_size_axis_label = "Samples with reads",
               node_color_axis_label = "Average percentage",        
               layout = "davidson-harel", 
               initial_layout = "reingold-tilford")

d) filter + 1 n_obs & obj$data$tax_occ$BAL (but obj$data$mean_perc$BAL defined in the obj -> not working

obj %>%
taxa::filter_taxa(taxon_ranks == "o", supertaxa = TRUE)%>%
heat_tree(node_size = n_obs,   
               node_color = obj$data$tax_occ$BAL,           
               node_label = taxon_names,
               node_size_axis_label = "Samples with reads",
               node_color_axis_label = "Average percentage",        
               layout = "davidson-harel", 
               initial_layout = "reingold-tilford")
Error in check_element_length(c("node_size", "edge_size", "node_label_size",  : 
  Length of argument'node_color' must be a factor of the length of 'taxon_id'
In addition: Warning message:
There is no "taxon_id" column in the data set "4", so there are no taxon IDs

e) filter + 1 n_obs & obj$data$tax_occ$BAL (other not defined in the obj) -> works

obj %>%
taxa::filter_taxa(taxon_ranks == "o", supertaxa = TRUE)%>%
heat_tree(node_size = n_obs,   
               node_color = BAL,           
               node_label = taxon_names,
               node_size_axis_label = "Samples with reads",
               node_color_axis_label = "Average percentage",        
               layout = "davidson-harel", 
               initial_layout = "reingold-tilford")

However, I have managed to surpass the problem by creating individual obj for each sample from the beginning. Also, I renamed one "BAL" call to other word, so no need to use the longer expression for precision.

@zachary-foster
Copy link
Contributor

Thanks for the data, I will look at it today.

In regards to the filtering problem: I think it is because obj$data$tax_occ$BAL is the value for the unfiltered obj but the the object being passed to heat_tree has been filtered so they have different numbers of taxa. Try:

filtered_obj <- taxa::filter_taxa(taxon_ranks == "o", supertaxa = TRUE)
heat_tree(filtered_obj,
          node_size = filtered_obj$data$tax_occ$BAL,   
          node_color = filtered_obj$data$mean_perc$BAL,           
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",        
          layout = "davidson-harel", 
          initial_layout = "reingold-tilford")

If that does not work, make sure that tax_occ and mean_perc have one row per taxon.

@zachary-foster
Copy link
Contributor

Hi @Mirin1357,

I found out what the problem was. The 'LUNG.T vs LUNG.DP' comparison has twice the number of values per taxon and there is no data for the 'LUNG.PT vs. LUNG.T' comparison.

> table(paste(obj$data$diff_table$treatment_1, obj$data$diff_table$treatment_2))

     BAL SALIVA     LUNG.DP BAL  LUNG.DP SALIVA     LUNG.PT BAL LUNG.PT LUNG.DP  LUNG.PT SALIVA      LUNG.T BAL 
            354             354             354             354             354             354             354 
 LUNG.T LUNG.DP   LUNG.T SALIVA 
            708             354

I have added a check to the heat_tree_matrix function so that now you will get the error:

Error: All pairs being compared should have one value per taxon. The following do not:
  LUNG.DP vs. LUNG.T (708)

So that should be easier to figure out in the future.

You can install the development version with that error message if you want:

install.packages("devtools")
devtools::install_github("grunwaldlab/metacoder")

zachary-foster added a commit that referenced this issue Jul 10, 2019
@Mirin1357
Copy link

Hello @zachary-foster,

DESeq

I rechecked the code for comparisons, and indeed, I have found one misspelled sample name, resulting in lack of 1 comparison and twice of the other. Then I used your suggestion to verify, and all was neet and the code ran without any further issues!!!!
Thank you so much! And yes, I think that this message will be found as helpful, at least to know where to look at when the things do not work.

Filtering

I tried your suggestion, but unfortunately, it did not work.

passing taxon_ranks (filtered_obj) and taxon_ranks (obj) show good filtering, but proceeding to heat_tree gives following error:

Error in check_element_length(c("node_size", "edge_size", "node_label_size",  : 
  Length of argument'node_color' must be a factor of the length of 'taxon_id'

however, both object appear exactly the same with print function (I do not know if this is normal), and the number of taxons is the same between tax_occ, mean_perc and tax_abund

> print (obj)
<Taxmap>
  354 taxa: ab. Bacteria, ac. SR1 ... sb. Roseburia, sc. Lachnobacterium
  354 edges: NA->ab, ab->ac, ab->ad, ab->ae ... jb->rz, iy->sa, iy->sb, iy->sc
  5 data sets:
    tax_data:
      # A tibble: 1,321 x 82
        taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041` `DNA17025-042`
        <chr>             <dbl>          <dbl>          <dbl>          <dbl>
      1 ac                    0              0              0              0
      2 ac                    0              0              0             35
      3 ac                    0              0              0              0
      # ... with 1,318 more rows, and 77 more variables: `DNA17025-043` <dbl>,
      #   `DNA17025-044` <dbl>, `DNA17025-046` <dbl>, `DNA17025-048` <dbl>,
      #   `DNA17025-049` <dbl>, `DNA17025-052` <dbl>, `DNA17025-055` <dbl>,
      #   `DNA17025-057` <dbl>, `DNA17025-060` <dbl>, `DNA17025-062` <dbl>, ...
    class_data:
      # A tibble: 7,926 x 5
        taxon_id input_index tax_rank tax_name regex_match
        <chr>          <int> <chr>    <chr>    <chr>      
      1 ab                 1 k        Bacteria k__Bacteria
      2 ac                 1 p        SR1      p__SR1     
      3 ac                 1 c        NA       c__NA      
      # ... with 7,923 more rows
    tax_abund:
      # A tibble: 354 x 80
        taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041` `DNA17025-042`
        <chr>             <dbl>          <dbl>          <dbl>          <dbl>
      1 ab                38865          29891          55151          39120
      2 ac                    0              0              0             35
      3 ad                  128             89            202            160
      # ... with 351 more rows, and 75 more variables: `DNA17025-043` <dbl>,
      #   `DNA17025-044` <dbl>, `DNA17025-046` <dbl>, `DNA17025-048` <dbl>,
      #   `DNA17025-049` <dbl>, `DNA17025-052` <dbl>, `DNA17025-055` <dbl>,
      #   `DNA17025-057` <dbl>, `DNA17025-060` <dbl>, `DNA17025-062` <dbl>, ...
    tax_occ:
      # A tibble: 354 x 6
        taxon_id SALIVA   BAL LUNG.DP LUNG.PT LUNG.T
        <chr>     <int> <int>   <int>   <int>  <int>
      1 ab           17    15      17      14     16
      2 ac            9     0       0       0      0
      3 ad           16     9       3       3      4
      # ... with 351 more rows
    mean_perc:
      # A tibble: 354 x 6
        taxon_id  SALIVA    BAL LUNG.DP LUNG.PT  LUNG.T
        <chr>      <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
      1 ab       39083.  17046. 13352.  13039.  5846.  
      2 ac          32.1     0      0       0      0   
      3 ad         185      20     15.2    10.4    3.19
      # ... with 351 more rows
  0 functions:


> print (filtered_obj)
<Taxmap>
  100 taxa: ab. Bacteria, ad. TM7 ... ej. Erysipelotrichales
  100 edges: NA->ab, ab->ad, ab->ae, ab->af ... bv->ef, bx->eh, by->ei, bz->ej
  5 data sets:
    tax_data:  #OTU counts
      # A tibble: 1,321 x 82
        taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041` `DNA17025-042`
        <chr>             <dbl>          <dbl>          <dbl>          <dbl>
      1 ab                    0              0              0              0
      2 ab                    0              0              0             35
      3 ab                    0              0              0              0
      # ... with 1,318 more rows, and 77 more variables: `DNA17025-043` <dbl>,
      #   `DNA17025-044` <dbl>, `DNA17025-046` <dbl>, `DNA17025-048` <dbl>,
      #   `DNA17025-049` <dbl>, `DNA17025-052` <dbl>, `DNA17025-055` <dbl>,
      #   `DNA17025-057` <dbl>, `DNA17025-060` <dbl>, `DNA17025-062` <dbl>, ...
    class_data:
      # A tibble: 7,926 x 5
        taxon_id input_index tax_rank tax_name regex_match
        <chr>          <int> <chr>    <chr>    <chr>      
      1 ab                 1 k        Bacteria k__Bacteria
      2 ab                 1 p        SR1      p__SR1     
      3 ab                 1 c        NA       c__NA      
      # ... with 7,923 more rows
    tax_abund:   #taxon counts
      # A tibble: 354 x 80
        taxon_id `DNA17025-036` `DNA17025-040` `DNA17025-041` `DNA17025-042`
        <chr>             <dbl>          <dbl>          <dbl>          <dbl>
      1 ab                38865          29891          55151          39120
      2 ab                    0              0              0             35
      3 ad                  128             89            202            160
      # ... with 351 more rows, and 75 more variables: `DNA17025-043` <dbl>,
      #   `DNA17025-044` <dbl>, `DNA17025-046` <dbl>, `DNA17025-048` <dbl>,
      #   `DNA17025-049` <dbl>, `DNA17025-052` <dbl>, `DNA17025-055` <dbl>,
      #   `DNA17025-057` <dbl>, `DNA17025-060` <dbl>, `DNA17025-062` <dbl>, ...
    tax_occ:
      # A tibble: 354 x 6
        taxon_id SALIVA   BAL LUNG.DP LUNG.PT LUNG.T
        <chr>     <int> <int>   <int>   <int>  <int>
      1 ab           17    15      17      14     16
      2 ab            9     0       0       0      0
      3 ad           16     9       3       3      4
      # ... with 351 more rows
    mean_perc:
      # A tibble: 354 x 6
        taxon_id  SALIVA    BAL LUNG.DP LUNG.PT  LUNG.T
        <chr>      <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
      1 ab       39083.  17046. 13352.  13039.  5846.  
      2 ab          32.1     0      0       0      0   
      3 ad         185      20     15.2    10.4    3.19
      # ... with 351 more rows
  0 functions:

I have also tried to filter before defining tax_occ or mean_perc, did not work. Only pure filtered_obj works.

since this is however the subject of another issue, should I move the comments there?
Thank you for your help!

@zachary-foster
Copy link
Contributor

Ahh, I forgot, if you want to filter per-taxon data using filter_taxa, you need to use reassign_obs = FALSE. If you dont, then the taxon ids of the data get changed to the taxa at the order ranks instead of being removed. Then the data is not per-taxon and that is what causes the error. You want the tax_abund, tax_occ, and mean_perc tables to have the same number of rows as there are taxa (100 in this case). Try:

filtered_obj <- taxa::filter_taxa(obj, taxon_ranks == "o", supertaxa = TRUE, reassign_obs = FALSE)
heat_tree(filtered_obj,
          node_size = filtered_obj$data$tax_occ$BAL,   
          node_color = filtered_obj$data$mean_perc$BAL,           
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",        
          layout = "davidson-harel", 
          initial_layout = "reingold-tilford")

However, this will also remove rows in the tax_data table, which you probably don't want (will mess up n_obs if you use it for plotting). Instead you can do:

filtered_obj <- taxa::filter_taxa(obj, taxon_ranks == "o", supertaxa = TRUE, 
                                  reassign_obs = c(tax_abund = FALSE, tax_occ = FALSE, mean_perc = FALSE))
heat_tree(filtered_obj,
          node_size = filtered_obj$data$tax_occ$BAL,   
          node_color = filtered_obj$data$mean_perc$BAL,           
          node_label = taxon_names,
          node_size_axis_label = "Samples with reads",
          node_color_axis_label = "Average percentage",        
          layout = "davidson-harel", 
          initial_layout = "reingold-tilford")

@Mirin1357
Copy link

Hello @zachary-foster ,

thank you for the solution! It works flawless for both examples! Of course, I would never thought of it...

Thank you very much for your help in both problems, I hope it will also be useful to others.

Best regards and great job for the package once again!

@zachary-foster
Copy link
Contributor

Im glad it worked out. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants