Skip to content

Commit 3c0533b

Browse files
authored
Merge pull request #65 from JD2112/v2.0
v2.0 - Pangolin container separated from Illumina and Nanopore workflows
2 parents 05c412d + 94180d7 commit 3c0533b

File tree

160 files changed

+50638
-1392
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+50638
-1392
lines changed

.github/workflows/black-check.yml

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
name: black
2+
on: pull_request
3+
jobs:
4+
black:
5+
runs-on: ubuntu-20.04
6+
steps:
7+
- uses: actions/checkout@v2
8+
- uses: actions/setup-python@v1
9+
with:
10+
python-version: 3.9
11+
- run: |
12+
python -m pip install --upgrade pip
13+
pip install git+https://github.com/psf/black
14+
- run: |
15+
black --check --verbose .

.github/workflows/build_dockerfile.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ on:
55
- cron: '0 0 * * *'
66
push:
77
branches:
8-
- update_nanopore_container
8+
- update_nanopore_container
99
jobs:
1010
get-version:
1111
runs-on: ubuntu-latest
@@ -78,7 +78,7 @@ jobs:
7878
#Build docker for nanopore
7979
docker build --no-cache -f environments/nanopore/Dockerfile -t genomicmedicinesweden/gms-artic-nanopore:latest -t genomicmedicinesweden/gms-artic-nanopore:${{ steps.date.outputs.date }}-p-${REPO_VER}-d-${pangolin_data_VER}-c-${constellations_VER}-s-${scorpio_VER} .
8080
#Build docker for pangolin-check for specific requirements
81-
docker build --no-cache -f environments/nanopore/pangolin/Dockerfile -t genomicmedicinesweden/gms-artic-pangolin:latest -t genomicmedicinesweden/gms-artic-pangolin:${{ steps.date.outputs.date }}-p-${REPO_VER}-d-${pangolin_data_VER}-c-${constellations_VER}-s-${scorpio_VER} --build-arg PANGOLIN_VER=v${REPO_VER} .
81+
docker build --no-cache -f environments/pangolin/Dockerfile -t genomicmedicinesweden/gms-artic-pangolin:latest -t genomicmedicinesweden/gms-artic-pangolin:${{ steps.date.outputs.date }}-p-${REPO_VER}-d-${pangolin_data_VER}-c-${constellations_VER}-s-${scorpio_VER} --build-arg PANGOLIN_VER=v${REPO_VER} .
8282
8383
- name: Push Docker image to DockerHub
8484
shell: bash

.gitignore

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
.DS_Store
21
.nextflow*
32
nextflow
43
results
54
*.sif
65
work
7-
environments/.DS_Store
86
.idea/
7+
.DS_Store

.v2releaseprocesses

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
gms-artic v2.0 release workflow processes
2+
3+
Nanopore medaka processes
4+
1. versions
5+
2. pangoversions
6+
3. fastqcNanopore
7+
4. multiqcNanopore
8+
5. articDownloadScheme
9+
6. articGuppyPlex
10+
7. articMinIONMedaka
11+
8. articRemoveUnmappedReads
12+
9. makeQCCSV
13+
10. writeQCSummaryCSV
14+
11. collateSamples -
15+
12. pangolinTyping
16+
13. nextclade
17+
14. getVariantDefinitions
18+
15. makeReport
19+
20+
Nanopore nanopolish processes
21+
1. versions
22+
2. pangoversions
23+
3. fastqcNanopore
24+
4. multiqcNanopore
25+
5. pycoqc
26+
6. articDownloadScheme
27+
7. articGuppyPlex
28+
8. articMinIONNanopolish
29+
9. articRemoveUnmappedReads
30+
10. makeQCCSV
31+
11. writeQCSummaryCSV
32+
12. collateSamples
33+
13. pangolinTyping
34+
14. nextclade
35+
15. getVariantDefinitions
36+
16. makeReport
37+
38+
Illumina processes
39+
1. articDownloadScheme
40+
2. indexReference
41+
3. versions
42+
4. pangoversions
43+
5. fastqc
44+
6. readTrimming
45+
7. readMapping
46+
8. flagStat
47+
9. trimPrimerSequences
48+
10. depth
49+
11. callConsensusFreebayes
50+
12. annotationVEP
51+
13. callVariants
52+
14. makeConsensus
53+
15. makeQCCSV
54+
16. writeQCSummaryCSV
55+
17. statsCoverage
56+
18. statsInsert
57+
19. statsAlignment
58+
20. multiqc
59+
21. collateSamples
60+
22. pangolinTyping
61+
23. nextclade
62+
24. getVariantDefinitions
63+
25. makeReport

README.md

+133-37
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,160 @@
1-
# GMS-artic (ncov2019-artic-nf)
1+
2+
![logo](workflow-image/logo.png)
23

34
A nextflow pipeline with a GMS touch for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics).
45

6+
### Table of contents -
7+
- [Version updates](#Version-updates)
8+
- [Pipeline Diagram](#Pipeline-Diagram)
9+
- [Requirements](#Requirements)
10+
- [Quick start guide](#Quick-Start-Guide)
11+
- [parameters setup](#Parameters-setup)
12+
- [Test Data](#Test-Data)
13+
- [Run on local server](#-Run-on-local-server)
14+
- [Requirements](#Requirements)
15+
- [Illumina pipeline](#Illumina-pipeline)
16+
- [Nanopore nanopolish pipeline](#Nanopore-nanopolish-pipeline)
17+
- [Nanopore medaka pipeline](#Nanopore-medaka-pipeline)
18+
- [How to run in NGP server](#How-to-run-in-NGP-server)
19+
- [Datafile structure](#Datafile-structure)
20+
- [Pipeline run command](#Manual-running-of-analysis-pipeline)
21+
- [Illumina pipeline](#Run-Illumina-pipeline)
22+
- [Nanopore pipeline](#Run-Nanopore-Pipeline)
23+
- [Useful information](#Useful-information)
524
------------
6-
#### Major changes
25+
# Version updates
26+
## v2.0.0
27+
### Major updates
28+
- Docker container separated for Pangolin typing
29+
- Illumina container: [gms-artic-illumina](https://hub.docker.com/repository/docker/genomicmedicinesweden/gms-artic-illumina)
30+
- Nanopore container: [gms-artic-nanopore](https://hub.docker.com/repository/docker/genomicmedicinesweden/gms-artic-nanopore)
31+
- Pangolin container: [gms-artic-pangolin](https://hub.docker.com/repository/docker/genomicmedicinesweden/gms-artic-pangolin)
32+
- pycoQC container : [pycoqc](https://hub.docker.com/repository/docker/jd21/pycoqc)
33+
- Added separate package version files for each workflow
34+
- versions: for Illumina and Nanopore
35+
- pangoversion: for pangolin typing
36+
- Illumina analysis additional features
37+
- flagstat
38+
- depth
39+
- VEP annotation
40+
- Illumina results works for sc2reporter visualization
41+
- Nanopore analysis additional features (artic & medaka)
42+
- [fastqc](https://github.com/s-andrews/FastQC)
43+
- [multiqc](https://multiqc.info)
44+
- [pycoQC](https://github.com/a-slide/pycoQC) *(only for artic)*
45+
46+
## v1.8.0
47+
### Minor updates
48+
49+
- Pangolin v4 support
50+
- Updated Picard arguments
51+
- FastQC commands can be added from config
52+
- Added version of pangolin to build_dockerfile
53+
54+
### Bug fixes
55+
- Fixed build_dockerfile
56+
- Fixed R issue
57+
- Fixed mamba issue
758

59+
### Major changes
860
* The illumina and nanopore tracks automatically run pangolin and nextclade.
961
* Generates report for base changes.
1062

11-
###### 1. gms-artic in ngp-gms
63+
# Pipeline Diagram
64+
![gms-artic package](workflow-image/GMS-Artic_workflow.png)
65+
66+
Find DAG and other figures [here](workflow-image/)
67+
68+
# Requirements
69+
- Nextflow version >=20.10, <22.0 (tested OK on NextFlow version 20.10.0, version 21.10.6)
70+
- Singularity version 3.7.1 (tested OK)
71+
- Conda version >= 4.13.0 (tested OK)
72+
73+
# Quick Start Guide
74+
## Test Data
75+
To test the pipeline, an [example dataset](./.github/data) for both Illumina and Nanopore (nanopolish, medaka) datafiles (from ConnerLab) provided.
76+
77+
# parameters setup
78+
## primer scheme
79+
##### --scheme: To use the primer list, add --scheme to the CLI, eg., use 'nCoV-2019/V3' for artic primers or 'midnight-primer/V1'
1280

13-
*for nanopore analysis (default is "midnight")*
14-
```
15-
sample_name
16-
|___ fast5_pass/
17-
|___ fastq_pass/
18-
|___ sequencing_summary.txt
19-
```
20-
*for illumina analysis*
2181
```
22-
sample_name
23-
|___ fastq/
82+
--scheme nCoV-2019/V3/
83+
--scheme midnight-primers/V1/
84+
--scheme eden-primers/V1/
2485
```
25-
#### Manual running of analysis pipeline
26-
###### 2. Run Illumina pipeline
86+
**To run the artic pipeline, please change the [nanopore.config](https://github.com/JD2112/gms-artic/blob/master/conf/nanopore.config) 'min_length' (default = 400) and 'max_length' (default = 700)**
87+
88+
**For more parameters setup, please see the [ConnerLab documentation](ConnerLab-README.md)**
89+
90+
## Run on local server
91+
### Requirements
92+
1. Containers: [Singularity](https://singularity-tutorial.github.io/01-installation/), [Docker](https://docs.docker.com/engine/install/)
93+
2. [Nextflow>=20](https://www.nextflow.io/docs/latest/getstarted.html)
94+
95+
### Illumina pipeline
2796
```
28-
$ nextflow run main.nf -profile singularity,sge \
97+
nextflow run main.nf -profile singularity \
2998
--illumina --prefix "test_illumina" \
3099
--directory .github/data/fastqs/ \
31100
--outdir illumina_test
32101
```
33-
34-
###### 3. Run Nanopore Pipeline
35-
###### **Deafult is "midnight" protocol**
102+
### Nanopore nanopolish pipeline
36103
```
37-
$ nextflow run main.nf -profile singularity \
38-
--nanopolish --prefix "midnight" \
39-
--basecalled_fastq /home/test/fastq_pass/ \
40-
--fast5_pass /home/test/fast5_pass/ \
41-
--sequencing_summary /home/test/sequencing_summary_FAP82331_657703c9.txt \
42-
--scheme-directory midnight-primer/V1/ \
43-
--outdir /home/test/midnight_test -with-report midnight
104+
nextflow run main.nf -profile singularity \
105+
--nanopolish --prefix "test_nanopore_nanopolish" \
106+
--basecalled_fastq .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fastq_pass/ \
107+
--fast5_pass .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fast5_pass/ \
108+
--sequencing_summary .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/sequencing_summary_FAK72834_298b7829.txt \
109+
--outdir nanopore_nanopolish
44110
```
45-
46-
###### --scheme: To use the primer list, add --scheme to the CLI, eg., use 'nCoV-2019/V3' for artic primers or 'midnight-primer/V1'
47-
111+
#### Nanopore medaka pipeline
48112
```
49-
--scheme nCoV-2019/V3/
50-
--scheme midnight-primers/V1/
51-
--scheme eden-primers/V1/
52-
113+
nextflow run main.nf -profile singularity \
114+
--medaka --prefix "test_nanopore_medaka" \
115+
--basecalled_fastq .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fastq_pass/ \
116+
--outdir nanopore_medaka
53117
```
54-
###### **To run the artic pipeline, please change the [nanopore.config](https://github.com/JD2112/gms-artic/blob/master/conf/nanopore.config) 'min_length' (default = 400) and 'max_length' (default = 700)**
55118

119+
## Run on NGP server
120+
### Datafile structure
121+
1. *for Nanopore analysis (default is "midnight")*
122+
```
123+
sample_name
124+
|___ fast5_pass/
125+
|___ fastq_pass/
126+
|___ sequencing_summary.txt
56127
```
57-
$ nextflow run main.nf -profile singularity,sge \
128+
#### Run Nanopolish pipeline
129+
```
130+
nextflow run main.nf -profile singularity,sge \
58131
--nanopolish --prefix "test_nanopore" \
59132
--basecalled_fastq .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fastq_pass/ \
60133
--fast5_pass .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fast5_pass/ \
61134
--sequencing_summary .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/sequencing_summary_FAK72834_298b7829.txt \
62-
--outdir nanopore_test
135+
--outdir nanopore_test
136+
```
137+
138+
#### Run medaka pipeline
63139
```
64-
#### To update your container image to the latest version from [dockerhub](https://hub.docker.com/orgs/genomicmedicinesweden/repositories), please delete your local image first before running the analysis pipeline.
140+
nextflow run main.nf -profile singularity,sge \
141+
--medaka --prefix "test_nanopore_medaka" \
142+
--basecalled_fastq .github/data/nanopore/20200311_1427_X1_FAK72834_a3787181/fastq_pass/ \
143+
--outdir nanopore_medaka
144+
```
145+
2. *for Illumina analysis*
146+
```
147+
sample_name
148+
|___ fastq/
149+
```
150+
#### Run Illumina pipeline
151+
```
152+
nextflow run main.nf -profile singularity,sge \
153+
--illumina --prefix "test_illumina" \
154+
--directory .github/data/fastqs/ \
155+
--outdir illumina_test
156+
```
157+
158+
159+
# Useful information
160+
1.To update your container image to the latest version from [dockerhub](https://hub.docker.com/orgs/genomicmedicinesweden/repositories), please delete your local image first before running the analysis pipeline.

conf/base.config

+5-4
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,19 @@ params{
1515
scheme = false
1616
tmpdir = "~/tmp"
1717

18+
1819
// Repo to download your primer scheme from
19-
schemeRepoURL = 'https://github.com/genomic-medicine-sweden/gms-artic.git'
20-
// schemeRepoURL = 'https://github.com/artic-network/primer-schemes.git'
20+
//schemeRepoURL = 'https://github.com/genomic-medicine-sweden/gms-artic.git'
21+
schemeRepoURL = 'https://github.com/jd2112/gms-artic.git'
2122

2223
// Directory within schemeRepoURL that contains primer schemes
2324
schemeDir = 'gms-artic'
2425

2526
// Scheme name
26-
// scheme = 'midnight-primer'
27+
scheme = 'nCoV-2019-primer'
2728

2829
// Scheme version
29-
schemeVersion = 'V1'
30+
schemeVersion = 'V3'
3031

3132
// Run experimental medaka pipeline? Specify in the command using "--medaka"
3233
medaka = false

conf/illumina.config

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Illumina specific params
22

33
params {
4+
45
// Repo to download your primer scheme from
56
schemeRepoURL = 'https://github.com/artic-network/primer-schemes.git'
67

@@ -12,8 +13,7 @@ params {
1213

1314
// Scheme version
1415
schemeVersion = 'V3'
15-
16-
16+
1717
// Instead of using the ivar-compatible bed file in the scheme repo, the
1818
// full path to a previously-created ivar bed file. Must also supply
1919
// ref.

conf/nanopore.config

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ params {
1616
// IF SET TO false THIS WILL USE artic minion DEFAULT (100)
1717
normalise = 500
1818

19+
1920
// Use bwa not minimap2? Specify in the command using "--bwa"
2021
bwa = false
2122

environments/illumina/Dockerfile

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ FROM continuumio/miniconda3:latest AS condabuild
22
LABEL authors="Matt Bull" \
33
description="Docker image containing all requirements for an Illumina ncov2019 pipeline"
44

5-
COPY environments/extras.yml /extras.yml
6-
COPY environments/illumina/environment.yml /environment.yml
5+
COPY extras.yml /extras.yml
6+
COPY environment.yml /environment.yml
77
RUN /opt/conda/bin/conda update conda && \
88
/opt/conda/bin/conda install mamba -c conda-forge && \
99
/opt/conda/bin/conda update mamba -c conda-forge && \

environments/illumina/environment.yml

+4-7
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,12 @@ channels:
33
- conda-forge
44
- bioconda
55
- defaults
6-
- r
76
dependencies:
87
- biopython=1.74
98
- libxcb
109
- matplotlib>=3.3.3
1110
- python>=3.7
12-
- bwa=0.7.17=pl5.22.0_2
11+
- bwa=0.7.17
1312
- samtools=1.10
1413
- bcftools=1.10
1514
- trim-galore=0.6.5
@@ -28,11 +27,9 @@ dependencies:
2827
- fastqc=0.11.9
2928
- multiqc=1.11
3029
- nextclade=1.10.2
31-
- r=3.6.0
30+
- sambamba=0.8.0
31+
- ensembl-vep>=102.0
32+
- conda-forge::r-base
3233
- pip:
3334
- pandas >= 1.1
3435
- scikit-learn >= 0.23.1
35-
- git+https://github.com/cov-lineages/pangolin.git
36-
- git+https://github.com/cov-lineages/constellations.git
37-
- git+https://github.com/cov-lineages/scorpio.git
38-
- git+https://github.com/cov-lineages/pangolin-data.git

0 commit comments

Comments
 (0)