Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bowtie2 with local alignment soft-clips nt difference at end of genome #10

Open
ArtPoon opened this issue Apr 7, 2020 · 2 comments
Open

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 7, 2020

See #7
Raw data suggests the genome sequence is polymorphic or unreliable at first position of genome (either A or T).
Either we use bowtie2 with global alignment (will probably reduce mapping efficiency) or make some special exception for soft-clips at 5' or 3' end of genome when calculating the consensus.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Apr 7, 2020

Global alignment isn't that bad:

Elzar:data artpoon$ bowtie2 -x NC_045512 -S SRR11241254.global.sam -U SRR11241254.fastq
Warning: skipping read 'SRR11241254.131951 131951 length=1' because length (1) <= # seed mismatches (0)
Warning: skipping read 'SRR11241254.131951 131951 length=1' because it was < 2 characters long
Warning: skipping read 'SRR11241254.131952 131952 length=1' because length (1) <= # seed mismatches (0)
Warning: skipping read 'SRR11241254.131952 131952 length=1' because it was < 2 characters long
131952 reads; of these:
  131952 (100.00%) were unpaired; of these:
    1970 (1.49%) aligned 0 times
    129978 (98.50%) aligned exactly 1 time
    4 (0.00%) aligned >1 times
98.51% overall alignment rate

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Apr 7, 2020

Elzar:data artpoon$ grep TTTAAAGGTTTATA SRR11241254.global.sam 
SRR11241254.9	0	NC_045512.2	1	40	40M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAAC	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A39	YT:Z:UU
SRR11241254.10	0	NC_045512.2	1	40	40M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAAC	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A39	YT:Z:UU
SRR11241254.11	0	NC_045512.2	1	40	40M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAAC	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A39	YT:Z:UU
SRR11241254.12	0	NC_045512.2	1	42	70M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA	FEEDGFGFGGGGGGGGGGFDFGGGGGGGGFBGF@GGGGF@CDCEGFEFGGGGGGGGFGGGGGFFFCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A69	YT:Z:UU
SRR11241254.13	0	NC_045512.2	1	42	70M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A69	YT:Z:UU
SRR11241254.14	0	NC_045512.2	1	42	70M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGFCGGGGGGGGGFAGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A69	YT:Z:UU
SRR11241254.15	0	NC_045512.2	1	42	70M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A69	YT:Z:UU
SRR11241254.16	0	NC_045512.2	1	42	70M	*	0	0	TTTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC	AS:i:-5	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0A69	YT:Z:UU
Elzar:data artpoon$ grep ATTAAAGGTTTATA SRR11241254.global.sam 
SRR11241254.17	0	NC_045512.2	1	40	80M	*	0	0	ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACATGAA	GGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGFG	AS:i:-1XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:75T1T2	YT:Z:UU
SRR11241254.22	0	NC_045512.2	1	42	162M	*	0	0	ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGGGGGGGGGGGGGGGGF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:162	YT:Z:UU
SRR11241254.23	0	NC_045512.2	1	42	162M	*	0	0	ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:162	YT:Z:UU
SRR11241254.24	0	NC_045512.2	1	42	162M	*	0	0	ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC	GGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGFCFGGCFGGGGGGGGGCFFGFFGGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGGGGGGGGGGFFCCFGGFGGGGGGGGGGG9FGFCGC<F?FGGGGGGGGGGGGGGCBFFGF;EDGGGGGGGGGGGGC	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:162	YT:Z:UU

Note this is very low coverage - best approach would probably be to report this first base as a mixture W.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant