Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test more SARS-CoV-2 samples #555

Open
donkirkby opened this issue Apr 15, 2020 · 7 comments
Open

Test more SARS-CoV-2 samples #555

donkirkby opened this issue Apr 15, 2020 · 7 comments
Assignees
Milestone

Comments

@donkirkby
Copy link
Member

donkirkby commented Apr 15, 2020

After finishing the SARS-CoV-2 support in #549, do more extensive testing with published sample data. List of samples to download and the toolkit to download with.

Find more samples from SRA by searching for "Severe acute respiratory syndrome-related coronavirus"[orgn:__txid694009]. You can filter by platform, and there are currently 466 Illumina records.

It can be tricky to find the published consensus sequences for a sample. I registered for GISAID and found Accession EPI_ISL_408670, but it took me a while to figure out that the descriptions in the SRA abstract for SRR11140746 (SARS-CoV-2/2019-nCoV/USA-WI-1/2020) loosely match the virus name in GISAID for EPI_ISL_408670 (hCoV-19/USA/WI1/2020).

Art's advice:

I queried the SRR number in the NCBI SRA database to get the sample description and then searched for a similar description in the GISAID annotations. Not perfect, I know.

@donkirkby donkirkby added this to the 7.13 milestone Apr 15, 2020
@donkirkby
Copy link
Member Author

@dmacmillan, the code to test is currently on the MultiuseDocker branch.

@dmacmillan
Copy link
Contributor

I've run the following samples through MiCall via Docker on Windows 10 Home successfully!

Sample Time (m)
SRR11593354 192

@donkirkby
Copy link
Member Author

That's great, @dmacmillan! Have you found a consensus sequence to compare it to?

@cbrumme
Copy link

cbrumme commented Apr 30, 2020

The sample ID is "NRW-011"
So try GISAID Accession# "EPI_ISL_414507"

@dmacmillan
Copy link
Contributor

I am waiting on a confirmation email so that I can search via GISAID

@dmacmillan
Copy link
Contributor

dmacmillan commented May 1, 2020

I found another sample/consensus sequence, I'll keep track of the one's that I have found in this comment:

Sample Consensus Time (m)
SRR11593354 EPI_ISL_414507 192
SRR11578347 EPI_ISL_427026 Not run
SRR11578346 EPI_ISL_426898 Not run
SRR10903401 EPI_ISL_414507 Not run

Pre-existing Table

Run Compared to Differences
SRR11593354_1.fastq EPI_ISL_414507 0 mismatches, 0 missing, and 648 added out of 29225.
SRR11593355_1.fastq EPI_ISL_414574 0 mismatches, 0 missing, and 435 added out of 29438.
SRR11593356_1.fastq EPI_ISL_414509 1 mismatches, 0 missing, and 91 added out of 29782.
SRR11593357_1.fastq EPI_ISL_414508 0 mismatches, 0 missing, and 395 added out of 29490.
SRR11593358_1.fastq EPI_ISL_414506 0 mismatches, 0 missing, and 887 added out of 28933.
SRR11593359_1.fastq EPI_ISL_414505 0 mismatches, 0 missing, and 92 added out of 29782.
SRR11593360_1.fastq EPI_ISL_414504 0 mismatches, 0 missing, and 447 added out of 29426.
SRR11593361_1.fastq EPI_ISL_414499 2 mismatches, 0 missing, and 144 added out of 29782.
SRR11593362_1.fastq EPI_ISL_414498 0 mismatches, 0 missing, and 384 added out of 29490.
SRR11593364_1.fastq EPI_ISL_414497 0 mismatches, 0 missing, and 65 added out of 29779.
SRR11593365_1.fastq EPI_ISL_413488 10 mismatches, 0 missing, and 145 added out of 29746.
SRR11578341 EPI_ISL_426901 2 mismatches, 1 missing, and 617 added out of 29249.
SRR11578342 EPI_ISL_426900 1 mismatches, 0 missing, and 398 added out of 29286.
SRR11578343 EPI_ISL_426899 0 mismatches, 0 missing, and 429 added out of 29462.
SRR11578344 EPI_ISL_426899 15 mismatches, 2 missing, and 414 added out of 29462.
SRR11578345 EPI_ISL_426656 8 mismatches, 17 missing, and 398 added out of 29498.
SRR11578346 EPI_ISL_426898 0 mismatches, 0 missing, and 488 added out of 29315.
SRR11578347 EPI_ISL_427026 0 mismatches, 0 missing, and 148 added out of 29676.
SRR11578348 EPI_ISL_427025 1 mismatches, 1 missing, and 452 added out of 29411.
SRR11578349 EPI_ISL_427024 1 mismatches, 0 missing, and 564 added out of 29301.
SRR10903401-SARS_S1 MN988669.1 Very good: 12 mismatches in the first 24 bases under low coverage, and 21 extra A's at the end out of 29881.
SRR10903402-SARS_S2 MN988668.1 Almost perfect: 21 extra A's at the end out of 29881.
SRR11092056-SARS_S3 MN996530 Bad: 899 mismatches, 17761 missing, and 217 added out of 29854.
SRR11092057-SARS_S4 MN996528.1 Very good: 4 mismatches, 33 missing, and 12 added out of 29891. Missing 14 at the start, a gap of 15 with no coverage at 5397, plus 4 single gaps of no coverage within 20 bases. The mismatches are all in low coverage, 3 are mixtures when coverage is 2. 12 extra A's at the end..
SRR11092058-SARS_S5 MN996527.1 Bad: lots of sections with no coverage. 38 mismatches, 7606 missing, and 26 added out of 29825.
SRR11092064-SARS_S6 MN996531.1 Bad: lots of sections with no coverage. 24 mismatches, 4667 missing, and 33 added out of 29857.
SRR11140744-SARS_S7 EPI_ISL_408670 Almost perfect: 28 missing from the start, and poly-A tail replaced with ACAGATATATACGCC out of 29879.
SRR11140746-SARS_S8 EPI_ISL_408670 Almost perfect: poly-A tail replaced with AATAWMAACAAACAGAGCCTAAAAAGGACAAAA4 out of 29879.
SRR11140748-SARS_S9 EPI_ISL_408670 Almost perfect: 6 missing from poly-A tail out of 29879.
SRR11140750-SARS_S10 EPI_ISL_408670 Almost perfect: 9 missing from the start, and poly-A tail replaced with ACAATTGCAACAATC out of 29879.
SRR11177792-SARS_S11 MT072688 Almost perfect: 57 added out of 29811. A few added to start, most added at end: AGTGCTGAG + poly-A tail.
SRR11314339-SARS_S12 MT192765 Almost perfect: 38 added out of 29829. A few added to start, most added at end: CCATGTGATTTTAATAG + poly-A tail.

@dmacmillan
Copy link
Contributor

@cbrumme @donkirkby I couldn't find a reference for sample SRR11578344, any ideas? If not I can find another.

@donkirkby donkirkby modified the milestones: 7.15 - HIVdb 9.0, 7.16 May 4, 2021
@CBeelen CBeelen modified the milestones: 7.16, 7.17 Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants