Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T1K for PGx #23

Open
nbiesot opened this issue Nov 20, 2023 · 7 comments
Open

T1K for PGx #23

nbiesot opened this issue Nov 20, 2023 · 7 comments

Comments

@nbiesot
Copy link

nbiesot commented Nov 20, 2023

Hi,

I am trying to use T1K for PGx, following the step-by-step plan described in the vcf_database. Unfortunately, I am not getting the expected results for my samples (for example, I get for the CYP2D6 gene, *4/*86 as output, where I expect *1/*4).

This is the case for both the reference file I created for CYP2D6 according to the step-by-step plan and the reference files in the cyp2d6_idx folder on Git.

What could be possible reasons for not getting the expected outputs?

(The data I am using is from the Genetic Testing Reference Material Coordination Program (GeT-RM). These reference materials contain mutations of clinical importance that have been confirmed by multiple volunteer laboratories using different testing platforms, including for the CYP2D6 gene.)

@mourisl
Copy link
Owner

mourisl commented Nov 20, 2023

Do you mean you did not get CYP2D6*1 series in the output? Could you please share the .dat generated from the procedure? Thank you.

@nbiesot
Copy link
Author

nbiesot commented Nov 21, 2023

Yes, indeed.
cyp2d6.txt

(I couldn't upload the .dat file, it was not supported)

@mourisl
Copy link
Owner

mourisl commented Nov 23, 2023

The txt file looks fine, and I can generate the reference fasta files containing the CYP2D61 or CYP2D61.XXX . So for the *4/*86 and *1/*4 is the genotyping results?

One possible reason is that CYP2D6 is highly homologous to CYP2D7, and you may need to put in some CYP2D7 gene sequences in the reference.

@nbiesot
Copy link
Author

nbiesot commented Nov 23, 2023

Thank you for looking into the file!
CYP2D6 is not the only gene I have looked at; I have also examined CYP2C9, CYP2C19, CYP3A5, and CYP4F2. For these genes as well, I do not get the expected output for the 16 samples I tested. If the .dat file looks good, is there another possibility for why I am not getting the expected output for these other genes?

@mourisl
Copy link
Owner

mourisl commented Nov 23, 2023

Can you show me your running commands and your genotype.tsv file? Is your data RNA-seq or other sequencing platform?

@nbiesot
Copy link
Author

nbiesot commented Nov 23, 2023

The WGS files are available at: https://www.ebi.ac.uk/ena/browser/view/ERR1955327
The command I am using is: run-t1k -f T1K/vcf_database/cyp2d6_idx/cyp2d6_dna_seq.fa -1 ERR1955327_1.fastq.gz -2 ERR1955327_2.fastq.gz --od ERR1955327/cyp2d6 --alleleDigitUnits 1 --alleleDelimiter . -t 16
The output that results from this is:
T1K_ERR1955327_1_genotype.ods

Thank you very much for your effort.

@mourisl
Copy link
Owner

mourisl commented Nov 23, 2023

I would recommend concatenating all the dna_seq.fa from cyp genes into a combined fasta file. This way it may resolve reads that are aligned to multiple cyp genes. Another parameter to tune is the "-s" option, the default 0.8 might be to lenient. You may consider trying values like 0.9 and 0.97.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants