bigly
is an API and a command-line app (binaries here). It is similar to samtools mpileup
but it reports a huge number of
additional variables that are useful for structural variant calling and visualization.
For each requested position, the struct below is filled by the appropriate position in any overlapping alignment that meets the requested filters:
// Pile holds the information about a single base.
type Pile struct {
Chrom string
Pos int
Depth int // count of reads passing filters.
RefBase byte // Reference base a this position.
MisMatches uint32 // number of mismatches .
ProperPairs int // count of reads with paired flag
SoftStarts uint32 // counts of base preceding an 'S' cigar op
SoftEnds uint32 // ... following ...
HardStarts uint32 // counts of base preceding an 'H' cigar op
HardEnds uint32
InsertionStarts uint32 // counts of base preceding an 'I' cigar op
InsertionEnds uint32
Deletions uint32 // counts of deletions 'D' at this base
Heads uint32 // counts of starts of reads at this base
Tails uint32 // counts of ends of reads at this base
Splitters uint32 // count of non-secondary reads with SA tags.
Splitters1 uint32 // count of non-secondary reads with exactly 1 SA tag.
Bases []byte // All bases from reads covering this position
Quals []uint8 // All quals from reads covering this position
MeanInsertSizeLP uint32 // Calculated with left-most of pair
MeanInsertSizeRM uint32 // Calculated with right-most of pair
OrientationPlusPlus uint32 // Paired reads mapped in +/+ orientation
OrientationMinusMinus uint32 // Paired reads mapped in -/- orientation
OrientationMinusPlus uint32 // Paired reads mapped in -/+ orientation
OrientationSplitter uint32 // Count of +/- or -/+ splitters.
Discordant uint32 // Number of reads with insert size > ConcordantCutoff
DiscordantChrom uint32 // Number of reads mapping on different chroms
DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
GC65 uint32
GC257 uint32
Duplicity65 float32 // measure of lack of sequence entropy.
Duplicity257 float32 // measure of lack of sequence entropy.
SplitterPositions []int
SplitterStrings []string
}
The program in cmd/bigly/main.go
is distributed as an example program of what one can do with this
library--namely make an enhanced pileup in a few lines of code.
At this time, the usage of the example program is very simple.
Default exclude flags are (sam.Unmapped | sam.QCFail | sam.Duplicate)
bigly $bam $chrom:$start-$end > o
if a reference is specified with -r
it will report statistics about GC content in windows
surrounding each base.
help:
bigly 0.2.0
usage: bigly [--minbasequality MINBASEQUALITY] [--minmappingquality MINMAPPINGQUALITY] [--excludeflag EXCLUDEFLAG] [--includeflag INCLUDEFLAG] [--mincliplength MINCLIPLENGTH] [--includebases] [--splitterverbosity SPLITTERVERBOSITY] [--reference REFERENCE] BAMPATH REGION
positional arguments:
bampath
region
options:
--minbasequality MINBASEQUALITY, -q MINBASEQUALITY
base quality threshold [default: 10]
--minmappingquality MINMAPPINGQUALITY, -Q MINMAPPINGQUALITY
mapping quality threshold [default: 5]
--excludeflag EXCLUDEFLAG, -F EXCLUDEFLAG
--includeflag INCLUDEFLAG, -f INCLUDEFLAG
--mincliplength MINCLIPLENGTH, -c MINCLIPLENGTH
only count H/S clips of at least this length [default: 15]
--includebases, -b output each base and base quality score
--splitterverbosity SPLITTERVERBOSITY, -s SPLITTERVERBOSITY
0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions
--reference REFERENCE, -r REFERENCE
optional path to reference fasta.
--help, -h display this help and exit
--version display version and exit
GoDoc Documentation is here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)
With a supporting library easing regional bam queries here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly/bamat?status.png)
View the tests for more examples. bigly
uses biogo library for bam access.
python scripts/plotter.py bigly.output
will make a plot with python+matplotlib on output from bigly.
This is a homozygous deletion easily seen from the depth, but we can see that the deletion is nicely delineated by soft-clips and we see aberrant insert sizes bounding the deletion. In most libraries, we would also see splitters flanking the region.
- Track strand of bases.
- Report the 5th and 95th percentile of insert size.
- More efficient insert-size calc
Ryan Layer's svv was the inspiration for the plotter.
Obviously, samtools is the reference pileup implementation.