-
Notifications
You must be signed in to change notification settings - Fork 100
Genotype IO and R
Reading in genotype information in R can be a big problem due to file size and file format support. Using rJava and the Genotype-IO API this is no problem anymore.
Note: for reading BGEN files we recommend: https://bitbucket.org/gavinband/bgen/wiki/rbgen
After instalation of rJava first start rJava and load the jar with dependencies. If you get a heapspace error increase the reserved memory by changing Xmx and Xms
library(rJava)
.jinit(classpath="Genotype-IO-0.0.4-SNAPSHOT-jar-with-dependencies.jar", parameters="-Xmx2g -Xms2g")
Using the example method below you can easily read in any supported file format. Just put in the base path of the input and file format. If necessary one can change the cache size and include filters for samples and variants.
loadGenotypeData <- function( basePath, dataType, cacheSize=1000, variantFilter = .jnull(class = "org/molgenis/genotype/variantFilter/VariantFilter"), sampleFilter = .jnull("org/molgenis/genotype/sampleFilter/SampleFilter")){
dataType <- toupper(dataType)
genotypeDataFormat <- .jcall("org/molgenis/genotype/RandomAccessGenotypeDataReaderFormats", "Lorg/molgenis/genotype/RandomAccessGenotypeDataReaderFormats;","valueOf", dataType)
return(.jcall(genotypeDataFormat, "Lorg/molgenis/genotype/RandomAccessGenotypeData;", "createFilteredGenotypeData", basePath, as.integer(cacheSize), variantFilter, sampleFilter))
}
In the code below examples for the variant filter and sample filter are given. "includedSnps" and "includedSamples" are R character arrays.
variantFilter <- .jcast(.jnew("org/molgenis/genotype/variantFilter/VariantIdIncludeFilter",includedSnps),"org/molgenis/genotype/variantFilter/VariantFilter")
sampleFilter <- .jcast(.jnew("org/molgenis/genotype/sampleFilter/SampleIdIncludeFilter",includedSamples), "org/molgenis/genotype/sampleFilter/SampleFilter")
A short example of a use case of the API in R, we read a small subset of all data in Plink Bed format, select a specific SNP and printing the histogram of dosages. Note the variantFilter and sampleFilter are optional. You can omit these arguments
genotypeData <- loadGenotypeData(basePath = "PathToFiles", dataType = "Plink_BED", variantFilter = variantFilter, sampleFilter = sampleFilter)
snp <- .jcall(genotypeData, "Lorg/molgenis/genotype/variant/GeneticVariant;", "getSnpVariantByPos", "chr", as.integer(position))
hist(as.numeric(.jcall(snp, "[F", "getSampleDosages")))
Use the rJava vignette for more information on rJava and check the java code examples for more help in using the Genotype IO in R.
- QTL mapping pipeline
- Genotype Harmonizer
- Genotype IO
- ASE
- GADO Command line
- Downstreamer
- GeneNetwork Analysis
Analysis plans
Other